期刊文献+

基于SVM的哈萨克语文本分类 被引量:2

Study on Kazak text categorization based on SVM
在线阅读 下载PDF
导出
摘要 介绍了支持向量机(SVM)和k-最近邻法(kNN)分类算法的思想和两种哈萨克语特征提取方法。对SVM、kNN和Bayes算法在哈萨克语文本分类的实验进行了比较。实验结果表明:在处理哈萨克语文本分类问题上,SVM较kNN和Bayes有较好的分类效果。由于哈萨克文单词的语素和构形的特点,若对哈萨克语词缀进行切分,则会降低文本分类的准确率和查全率。 This paper introduced the basic theory of the Support Vector Machine (SVM) and k-Nearest Neighbor (kNN) algorithm and two different features selection methods in Kazak natural language.An empirical study of using the SVM,kNN,Bayes algorithm to categorize the Kazak text was conducted.The experimental results show that compared with kNN,Bayes,SVM has better categorization of the Kazak text.Due to the characteristics of Kazak's morpheme and configuration,the precision and recall will be lowered if the word is cut with affix.
出处 《计算机应用》 CSCD 北大核心 2010年第6期1676-1678,共3页 journal of Computer Applications
基金 国家自然科学基金资助项目(60763005) 国家教育部/国家语委民族语言文字规范标准建设及信息化科研项目(MZ115-92)
关键词 文本分类 支持向量机 特征选择 k-最近邻法 text categorization Support Vector Machine (SVM) feature selection k-Nearest Neighbor (kNN)
  • 相关文献

参考文献10

  • 1MITCHELL T M.Machine learning[M].New York:McGraw Hill,1997.
  • 2VAMNIK V.Statistical learning theory[M].New York:Wiley,1998.
  • 3OSUNA E,FREUND R,GIROSI F.Support vector machines:Training and applications,AI Memo 1602[R].Cambridge:MIT,1997.
  • 4SOMAN K P.数据挖掘基础教程[M].范明,牛常勇,译.北京:机械工业出版社,2009.
  • 5DUNNING T.Accurate methods for the statistics of surprise and coincidence[J].Computational Linguistics,1993,19(1):61-74.
  • 6MOYOTL-HERNANDEZ E,JIMENEZ-SALAZAR H.Enhancement of DTP feature selection method for text categorization[C]//CICLing 2005.Washington,DC:IEEE,2005:719-722.
  • 7代六玲,黄河燕,陈肇雄.中文文本分类中特征抽取方法的比较研究[J].中文信息学报,2004,18(1):26-32. 被引量:230
  • 8石志伟,吴功宜.改善朴素贝叶斯在文本分类中的稳定性[C]//NCIRCS2004.上海:2004:137-145.
  • 9AAS K,EIKVIL L.Text eategorisation:A survey(1999)[EB/OL].[2009-10-10].http//citeseer.ist.pus.edu/aas99text.html.
  • 10TAGHVA K,BORSACK J,LUMOS S,et al.A comparison of automatic and manual zoning:An information retrieval prospective[J].International Journal on Document Analysis and Recognition,2004,6(4):230-235.

二级参考文献4

共引文献242

同被引文献17

  • 1孙晋文,肖建国.基于SVM文本分类中的关键词学习研究[J].计算机科学,2006,33(11):182-184. 被引量:12
  • 2马金娜,田大钢.基于支持向量机的中文文本自动分类研究[J].系统工程与电子技术,2007,29(3):475-478. 被引量:14
  • 3Sebastiani F.Machine Learning in Automated Text Categori-zation[J].ACM Computing Surveys,2002,34(1):1-47.
  • 4Joachims T.Text Categorization with Support Vector Ma-chines:Learning with Many Relevant Features[C]∥Proc of the10th European Conference on Machine Learning,1998:137-142.
  • 5James G S,Norbert R.Improving SVM Text Classification Performance Through Threshold Adjustment[C]∥Proc of the14th European Conference on Machine Learning,2003:361-372.
  • 6Kim H,Howland P,Park H.Dimension Reduction in Text Classification with Support Vector Machine[J].Journal of Machine Learning Research,2005,6(1):37-53.
  • 7Salton G,Wong A,Yang C S.A Vector Space Model for Automatic Indexing[J].Communications of the ACM,1975,18(11):613-620.
  • 8Vapnik V.The Nature of Statistical Learning Theory[M].New York:Springer-Verlag,1995.
  • 9Weka3:Data Mining Software in Java[EB/OL].[2011-09-10].http://www.cs.waikato.ac.nz/-ml/weka/index.ht-ml.
  • 10Chang C-C,Lin C-J.LIBSVM:A Library for Support Vector Machines[EB/OL].[2011-09-10].http://www.csie.ntu.edu.tw/-cjlin/libsvm/.

引证文献2

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部