期刊文献+

停用词表对基于SVM的中文文本情感分类的影响 被引量:6

The Influence of Stop Word Removal on the Chinese Text Sentiment Classification Based on SVM Technology
在线阅读 下载PDF
导出
摘要 运用非结构化信息挖掘,对网络评论情感进行分析是一个非常重要的方法。本文基于Web客户评论情感文本,在情感文本预处理过程中使用四种不同的停用词表,采用两种不同的特征选择方法,选用著名的TF-IDF权重计算方法,使用基于RBF核函数的支持向量机方法的分类器实现了对携程网上采集的4000个酒店客户评论情感文本的分类研究。通过实验,分析了不同特征选择方和停用词表的使用对客户评论文本情感分类的影响,提出了基于情感文本分类的有效的停用词表。 It is an important method to analyse Web reviews' sentiment categorization with unstructured information date mining.This paper based on the Web text reviews,using four different kinds of stop word removal way,two kinds of feature selection methods,the famous TF-IDF weighing assignment methods and the SVM(support vector machine) technology with the RBF kernel function categorize the 4,000 customer reviews text grasp on XIECHENG.With the results of the experiment,this paper analysis the influence of different kinds of feature selection methods and stop word removal on the Chinese text sentiment classification,represent the more effective stop word removal list.
出处 《情报学报》 CSSCI 北大核心 2011年第4期347-352,共6页 Journal of the China Society for Scientific and Technical Information
基金 国家社会科学基金(07BTQ010) 湖北省课题(Z20091701 2008d062 2008244 2007097 HB092-21) 武汉市课题(200940833384-02 20041007072-08) 中国纺织工业协会(2007082)支持
关键词 客户评论 情感分类 停用词表 特征选择 支持向量机 customer review sentiment classification stop words removal feature selection support vector machine
  • 相关文献

参考文献4

二级参考文献31

  • 1顾益军,樊孝忠,王建华,汪涛,黄维金.中文停用词表的自动选取[J].北京理工大学学报,2005,25(4):337-340. 被引量:36
  • 2Hart G W. To decode short cryptograms[A]. Communications of the ACM[C]. New York: Association for Computing Machinery, 1994.102-108.
  • 3Van Rijsbergen C J. Information retrieval[M]. London: Butterworths Scientific Publication, 1975.
  • 4Fox C. Lexical analysis and stoplists(including the ‘Brown Corpus’stoplist), information retrieval: Data structures and algorithms[M]. Upper Saddle River, New Jersey: Prentice Hall, 1992.
  • 5Sinka M P, Corne D W. Web intelligence WI 2003[A]. Proceedings IEEE/WIC International Conference on Soc[C]. Los Alamitos: IEEE Comput, 2003.396-402.
  • 6Silva C, Ribeiro B. The importance of stop word removal on recall values in text categorization[J]. Neural Networks, 2003, 3:20-24.
  • 7Yang Y. Pedersen J O. A comparative study on feature selection in text categorization[A]. Proceedings of ICML-97, 14th International Conference on Machine Learning[C]. San Francisco: Morgan Kaufmann Publishers Inc., 1997.412-420.
  • 8Luhn H P. The automatic creation of literature abstracts[J]. IBM Journal of Research and Development, 1958, 2(2):159-165.
  • 9Harman D. An experimental study of factors important in document ranking[A]. Proceedings of the 1986 ACM Conference on Research and Developments in Information Retrieval[C]. New York: Association for Computing Machinery, 1986.186-193.
  • 10北京大学计算语言学研究所. 1998年1月人民日报切分、标注语料库[EB/OL]. http:∥icl.pku.edu.cn//icl_groups/corpus/dwldform1.asp,2001-05-10/2004-04-01. (in Chinese)Institute of Computational Linguistics Peking University. Word segmentation corpus from People's Daily(January 1998)[EB/OL]. http:∥icl.pku.edu.cn//icl_groups/corpus/dwldform1.asp,2001-05-10/2004-04-01.

共引文献140

同被引文献67

引证文献6

二级引证文献36

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部