期刊文献+

基于两步策略的文本分类方法实验研究

Text Classification Based on Experimental Study of Two-step Strategy
在线阅读 下载PDF
导出
摘要 已知朴素贝叶斯分类器使用两步策略的分类方法提高了两类中文文本分类的效率,本文在此基础上,研究3个问题:1可以使用两步策略分类方法的分类器须满足的条件;23种理论上可用两步策略进行文本分类的分类器;3实验比较Rocchio、朴素贝叶斯、KNN 3种分类器两两组合后应用于多类英语文本分类的效果。实验结果表明:Rocchio、朴素贝叶斯、KNN 3种分类器满足两步策略分类的条件,且当KNN作第一步分类器,朴素贝叶斯作第二步分类器时分类效果最好。 Naive Bayesian classifier is known to use two-step classification strategy to improve the efficiency of two types of Chinese text categorization. This paper tries to solve the following three questions:(1) the condition of a classifier to be fulfilled by using two-step strategy text classification, (2) the theoretical analysis of the three classifiers which can be used for two-step strategy text classification, (3) experimental results comparison of Rocchio,Naive Bayes,KNN combination used in many types of English text classification. Experimental results show that the Rocchio,NB and KNN satisfy the condi- tions of two-step strategy. Best performance is achieved by using KNN as the first step classifier and NB as the second.
出处 《广西师范大学学报(自然科学版)》 CAS 北大核心 2011年第4期35-38,共4页 Journal of Guangxi Normal University:Natural Science Edition
基金 国家自然科学基金资助项目(60703010) 重庆市自然科学基金资助项目(2009BB2079)
关键词 文本分类 两步策略 ROCCHIO 朴素贝叶斯 K近邻 text categorization two-step strategy, Rocchio naive Bayes KNN
  • 相关文献

参考文献8

二级参考文献27

  • 1梁开健.基于DCSSM的文本特征提取及文本挖掘研究[J].自动化技术与应用,2005,24(5):54-56. 被引量:2
  • 2王元珍,钱铁云,冯小年.基于关联规则挖掘的中文文本自动分类[J].小型微型计算机系统,2005,26(8):1380-1383. 被引量:13
  • 3樊兴华,孙茂松.一种高性能的两类中文文本分类方法[J].计算机学报,2006,29(1):124-131. 被引量:70
  • 4张玉芳,陈剑敏,熊忠阳.一种改进的贝叶斯文本分类方法[J].广西师范大学学报(自然科学版),2007,25(2):206-209. 被引量:7
  • 5Lewis D. D.. An evaluation of phrasal and clustered representalions on a text categorization task. In: Proceedings of SIGIR'92,the 15st ACM International Conference on Research and Development in Information Retrieval, Copenhagen, Denmark,1992, 37-50.
  • 6Sebastiani F,. Machine learning in automated text categorization. ACM Computing Surveys, 2002, 34(1): 1-47.
  • 7Lewis D.. Naive bayes at forty: The independence assumption in information retrieval. In: Proceedings of the 10th European Conference on Machine Learning, Chemnitz, Germany, 1998,4-15.
  • 8Salton G.. Automatic Text Processing: The Transformation,Analysis, and Retrieval of Information by Computer. Reading,MA: Addison Wesley, 1989.
  • 9Mitchell T. M.. Machine Learning. New York: McCraw Hill,1996.
  • 10Joachims T.. Text categorization with support vector machines: Learning with many relevant features. In: Proceedings of the 10th European Conference on Machine Learning,Chemnitz, Germany, 1998, 137-142.

共引文献108

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部