期刊文献+

基于Word2fea模型的文本建模方法 被引量:1

Text Modeling Method Based on Word2fea Model
在线阅读 下载PDF
导出
摘要 文本聚类在数据挖掘和机器学习中发挥着重要作用,该技术经过多年的发展,已产生了一系列的理论成果。传统向量空间模型的文本建模方法存在维度高、数据稀疏和缺乏语义信息等问题,然而仅仅引入词典的文本建模部分解决了语义问题却又受限于人工词典词量少、人工耗力大等多种问题。文中借鉴主题模型的思想,提出一种以word2vec算法得到词向量为基础,词聚类的类别为主题,结合文本中主题的频率、分布范围、位置因子等特征以获得文本在类别空间上的特征向量,完成文本建模的方法 word2fea。将其与两种文本建模方法 VSM和word2vec_base进行比较,实验结果表明该方法能够明显提高文本分类准确率。 Text classification plays an important role in data mining and machine learning,which has produced a series of theory after years of development. The traditional text modeling method of vector space model has the problems of high dimension,sparse data,and the lack of semantic. However,the text modeling introduced the artificial dictionary is constrained by quantity of words,artificial power consumption and other problems. By referencing the idea of topic model,a text modeling method word2 fea was presented which based on the model of word2 vec for the topic clusters with the word vectors,meanwhile combined with the frequency,distribution and location of the topic on documents to obtain the feature of the text. Compared with two text modeling methods,VSMand word2vec_base,the experimental results show that this method can significantly improve the accuracy of text classification.
出处 《计算机技术与发展》 2016年第2期165-167,173,共4页 Computer Technology and Development
基金 中央高校基本科研业务费专项资金(2014B33014)
关键词 word2vec 文本建模 文本分类 word2fea word2vec text modeling text classification word2fea
  • 相关文献

参考文献14

二级参考文献81

  • 1王燕.一种改进的K-means聚类算法[J].计算机应用与软件,2004,21(10):122-123. 被引量:9
  • 2吴健,吴朝晖,李莹,邓水光.基于本体论和词汇语义相似度的Web服务发现[J].计算机学报,2005,28(4):595-602. 被引量:218
  • 3H Y Tan. Chinese place automatic recognition research. In: C N Huang, Z D Dong, eds. Proc of Computational Language.Beijing: Tsinghua University Press, 1999
  • 4Zhang Huaping, Liu Qun, Zhang Hao, et al. Automatic recognition of Chinese unknown words recognition. First SIGHAN Workshop Attached with the 19th COLING, Taipei, 2002
  • 5S R Ye, T S Chua, J M Liu. An agent-based approach to Chinese named entity recognition. The 19th Int'l Conf on Computational Linguistics, Taipei, 2002
  • 6J Sun, J F Gao, L Zhang, et al. Chinese named entity identification using class-based language model. The 19th Int'l Conf on Computational Linguistics, Taipei, 2002
  • 7Lawrence R Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proc of IEEE, 1989,77(2): 257~286
  • 8Shai Fine, Yoram Singer, Naftali Tishby. The hierarchical hidden Markov model: Analysis and applications. Machine Learning,1998, 32(1): 41~62
  • 9Richard Sproat, Thomas Emerson. The first international Chinese word segmentation bakeoff. The First SIGHAN Workshop Attached with the ACL2003, Sapporo, Japan, 2003. 133~143
  • 10J Hockenmaier, C Brew. Error-driven learning of Chinese word segmentation. In: J Guo, K T Lua, J Xu, eds. The 12th Pacific Conf on Language and Information, Singapore, 1998

共引文献488

同被引文献5

引证文献1

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部