期刊文献+

停用词表对中文文本情感分类的影响 被引量:22

The Influence of Stoplist on the Chinese Text Sentiment Categorization
在线阅读 下载PDF
导出
摘要 本文利用三种特征选择方法、两种权重计算方法、五种停用词表以及支持向量机分类器对汽车语料的文本情感类别进行了研究。实验结果表明,不同特征选择方法、权重计算以及停用词表,对文本情感分类的影响也不尽相同;除形容词、动词和副词外的其余词语作为停用词表以及不使用停用词表对情感分类作用较大,得到的分类结果比较好;总体上,采用信息增益和布尔型权重进行中文文本情感分类的效果较好。 In this paper, using three kinds of feature selection methods, two kinds weighing assignment methods, the five kinds of Stoplist and SVM on text sentiment classification are studied. The experiment results indicate that the greater text sentiment classification impact depends on other corpus, excluded adjective, verb, adverb as stop words and none stop words. As a whole, for text sentiment classification, information gain is superior to other feature selection methods and Boolean type weighting is superior to frequency type weighing.
作者 王素格 魏英
出处 《情报学报》 CSSCI 北大核心 2008年第2期175-179,共5页 Journal of the China Society for Scientific and Technical Information
基金 国家自然基金项目(60573074) 山西省自然科学基金(20041040) 山西省科技攻关项目(051129) 山西高校科技研究开发项目(200611002).
关键词 停用词 文本情感分类 特征选择 支持向量机 stop word, text sentiment classification, feature selection, support vector machine
  • 相关文献

参考文献9

  • 1顾益军,樊孝忠,王建华,汪涛,黄维金.中文停用词表的自动选取[J].北京理工大学学报,2005,25(4):337-340. 被引量:36
  • 2Hart G W. To decode short cryptograms[ A]. Communications of the ACM [ C ]. New York Association for Computing Machinery, 1994 : 102-108.
  • 3Yang Y, Pedersen J O. Acomparative study on feature selection in text categorization//Proceedings of ICML-97,14^th Internationa Conference on Machine Learning [ C ]. San Francisco Morgan Kaufmann Publishers Inc, 1997:412-420.
  • 4Silva C, RibeiroB. The importance of stop word removal on recall values in text categorization [ J]. Neural Networks, 2003,3 : 20-24.
  • 5李荣陆.文本分类若干关键技术研究.上海:复旦大学博士论文,2005.
  • 6Pang B, Lee L, Vaithyanathan S. Thumbs up? Sentiment classification using machine learning techniques [ C ]. The Conference on Empirical Methods in Natural Language Processing, 2002 : 79-86.
  • 7Peter D Tumey, Michael L Littman. Measuring Praise and Criticism: Inference of Semantic Orientation from Association. ACM Transaction on information systems, 2003, 21(4) :315-346.
  • 8Hatzivassilolou Kathleen V, Mckeown R. Predicting the semantic orientation of adjectives. Proceeding of the 35th Annual meeting of the association for computational linguistics and the 8th conference of the European Chapter of the ACL. Association for Computational Linguistics, New Brunswick, 1997 : 174-181.
  • 9王治敏 朱学锋 俞士汶.基于现代汉语语法信息词典的词语情感评价研究.Computational Linguistics and Chinese Language Processing,2005,10(4):581-592.

二级参考文献12

  • 1Hart G W. To decode short cryptograms[A]. Communications of the ACM[C]. New York: Association for Computing Machinery, 1994.102-108.
  • 2Van Rijsbergen C J. Information retrieval[M]. London: Butterworths Scientific Publication, 1975.
  • 3Fox C. Lexical analysis and stoplists(including the ‘Brown Corpus’stoplist), information retrieval: Data structures and algorithms[M]. Upper Saddle River, New Jersey: Prentice Hall, 1992.
  • 4Sinka M P, Corne D W. Web intelligence WI 2003[A]. Proceedings IEEE/WIC International Conference on Soc[C]. Los Alamitos: IEEE Comput, 2003.396-402.
  • 5Silva C, Ribeiro B. The importance of stop word removal on recall values in text categorization[J]. Neural Networks, 2003, 3:20-24.
  • 6Yang Y. Pedersen J O. A comparative study on feature selection in text categorization[A]. Proceedings of ICML-97, 14th International Conference on Machine Learning[C]. San Francisco: Morgan Kaufmann Publishers Inc., 1997.412-420.
  • 7Luhn H P. The automatic creation of literature abstracts[J]. IBM Journal of Research and Development, 1958, 2(2):159-165.
  • 8Harman D. An experimental study of factors important in document ranking[A]. Proceedings of the 1986 ACM Conference on Research and Developments in Information Retrieval[C]. New York: Association for Computing Machinery, 1986.186-193.
  • 9北京大学计算语言学研究所. 1998年1月人民日报切分、标注语料库[EB/OL]. http:∥icl.pku.edu.cn//icl_groups/corpus/dwldform1.asp,2001-05-10/2004-04-01. (in Chinese)Institute of Computational Linguistics Peking University. Word segmentation corpus from People's Daily(January 1998)[EB/OL]. http:∥icl.pku.edu.cn//icl_groups/corpus/dwldform1.asp,2001-05-10/2004-04-01.
  • 10自然语言处理开放平台. 文本分类语料库(复旦)训练语料[EB/OL]. http:∥www.nlp.org.cn/categories,2003-06-23/2004-05-01.(in Chinese)CNLP Platform. Training subset from text categorization corpus(Fudan)[EB/OL]. http:∥www.nlp.org.cn/categories,2003-06-23/2004-05-01.

共引文献42

同被引文献315

引证文献22

二级引证文献232

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部