摘要
介绍了支持向量机(SVM)和k-最近邻法(kNN)分类算法的思想和两种哈萨克语特征提取方法。对SVM、kNN和Bayes算法在哈萨克语文本分类的实验进行了比较。实验结果表明:在处理哈萨克语文本分类问题上,SVM较kNN和Bayes有较好的分类效果。由于哈萨克文单词的语素和构形的特点,若对哈萨克语词缀进行切分,则会降低文本分类的准确率和查全率。
This paper introduced the basic theory of the Support Vector Machine (SVM) and k-Nearest Neighbor (kNN) algorithm and two different features selection methods in Kazak natural language.An empirical study of using the SVM,kNN,Bayes algorithm to categorize the Kazak text was conducted.The experimental results show that compared with kNN,Bayes,SVM has better categorization of the Kazak text.Due to the characteristics of Kazak's morpheme and configuration,the precision and recall will be lowered if the word is cut with affix.
出处
《计算机应用》
CSCD
北大核心
2010年第6期1676-1678,共3页
journal of Computer Applications
基金
国家自然科学基金资助项目(60763005)
国家教育部/国家语委民族语言文字规范标准建设及信息化科研项目(MZ115-92)
关键词
文本分类
支持向量机
特征选择
k-最近邻法
text categorization
Support Vector Machine (SVM)
feature selection
k-Nearest Neighbor (kNN)