期刊文献+

基于规则的自动分类在文本分类中的应用 被引量:20

Rule-based Automatic Category Application on Text Category
在线阅读 下载PDF
导出
摘要 文本自动分类是指将文本按一定的策略归于一个或多个类别中的应用技术。本文首先介绍三种基于统计的自动分类技术 (k近邻分类器、支持向量机分类器和朴素贝叶斯分类器 ) ,剖析了基于统计的自动分类的优势及不足。基于统计的自动分类的不足主要表现为 :当类别之间分类特征的交叉变大时 ,分类精度呈下降趋势 ,在多层分类的情况下 ,此局限尤为突出。针对此局限性 ,为了提高自动分类的精度 ,我们引入了基于规则的自动分类来对其进行改进和扩充 ,并整合两种自动分类技术的优点 ,设计出了混合分类器系统 。 The technique of text automatic category is to classify texts into one or more classes according to some strategy.This paper firstly reports three kinds of technique of text automatic category based on statistic ( k nearest neighbor ,support vector machine and nave bayes),and analyses their advantages and disadvantages.The weakness of statistic based automatic category is the category precision decrease while the character intersect within classes increase, especially in the case of multi layers classifying. In order to improve statistic based automatic category performance, rule based automatic category is used. we combine statistic based category with rule based classifying method , design and realize a system of mixing category lastly, which has and has had very good performance in category.
出处 《中文信息学报》 CSCD 北大核心 2004年第4期9-14,共6页 Journal of Chinese Information Processing
关键词 计算机应用 中文信息处理 文本挖掘 文本分类 规则分类 computer application Chinese information processing text mining text category rule based classifying
  • 相关文献

参考文献8

二级参考文献19

  • 1吴军,王作英,禹锋,王侠.汉语语料的自动分类[J].中文信息学报,1995,9(4):25-32. 被引量:24
  • 2李辉.支撑向量机及其在文本分类中的应用,北京大学博士论文[M].,2001..
  • 3Peter Cord等 邵维忠等(译).Object-Oriented Analysis.Yourdon Press[M].北京:北京大学出版社,1992.65-77.
  • 4[1]Warren R Greiff. A Theory of Term Weighting Based on Exploratory Data Analysis, www. cs. umass.edu/~ greiff/
  • 5[2]Kaski S, Lagus K, Honkela T et al. Statistical Aspects of the WFEBSOM System in Organizing Document Collections. Computer Science and Statistics, 1998, (29) :281 - 290
  • 6M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases. In Proc. 1996 Int. Conf. Knowledge Discovery and Data Mining (KDD'96),1996.
  • 7M. Ankerst, M. Breunig, H. -P. Kriegel, and J. Sander. OPTICS: Ordering points to identify the clustering structure. In Proc. 1999 ACM-SIGMOD Int. Conf. Management of the Data(SIGMOD' 99),1999.
  • 8Yang, Y., Pedersen, J.O. A Comparative Study on Feature Selection in Text Categorization. Proc. of the 14th International Conference on Machine Learning ICML97.
  • 9Eui-Hong Han, George Karypis and Vipin Kumar. Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification. Pacific-Asia Conference on Knowledge Diseovery and Data Minings, 2001.
  • 10Yang Y,Proc of the 14th Intl Conf on Machine Learning ICML 97,1997年,412页

共引文献179

同被引文献216

引证文献20

二级引证文献196

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部