期刊文献+

分阶段融合的文本语义相似度计算方法 被引量:4

A Staged and Integrated Semantic Similarity Algorithm of Text
原文传递
导出
摘要 面向中文文本的信息检索,提出一种从句子、段落到文本整体分阶段进行的文本相似度计算方法。该方法结合文档的主题与应用范围,用语义加强的权重计算方法对特征词赋予相应的权重,并根据每个计算阶段的特点,分别融入对文本语义的计算因素,力求使中文文本的相似度计算结果更为准确。最后建立文本相似度计算系统,通过与传统算法的实验结果进行对比,证明改进后的算法可以取得更好的效果。 For Chinese text information retrieval, a staged and integrated similarity algorithm of text is proposed, which processes sentences, paragraphs and the whole document stage by stage. The algorithm combines the topic and application ranges of document, and the corresponding weight is given to the feature words via the weighted calculation method with the semantic enhancement. Moreover, these weights are integrated into the calculated factors of the text semantic with the characteristics of each calculation phase, respectively to reach the aim of finding a more accurate similarity calculation results for Chinese text similarity calculation. Finally, a text similarity computing system is built and the improved algo- rithm of the system achieves better experimental results comparing with the traditional algorithms.
作者 马军红
出处 《现代图书情报技术》 CSSCI 北大核心 2013年第10期20-26,共7页 New Technology of Library and Information Service
基金 陕西省教育厅科学研究计划项目"基于实时嵌入式安全的双向序列加密方法研究"(项目编号:2013JK1146)的研究成果之一
关键词 文本相似度 信息检索 语义相似度 权重 Texts similarity Information retrieval Semantic similarity Term weight
  • 相关文献

参考文献10

二级参考文献53

共引文献168

同被引文献48

  • 1许云,樊孝忠,张锋.基于知网的语义相关度计算[J].北京理工大学学报,2005,25(5):411-414. 被引量:53
  • 2余刚,裴仰军,朱征宇,陈华月.基于词汇语义计算的文本相似度研究[J].计算机工程与设计,2006,27(2):241-244. 被引量:25
  • 3Tan P, Steinbach M, Kumar V.数据挖掘导论[M].北京:人民邮电出版社,2011.
  • 4KUMAR N. Approximate string matching algorithm [ J]. Inter- national Journal on Computer Science and Engineering, 2010, 2 (3): 641-644.
  • 5KO Y, PARK J, SEO J. Improving text categorization using the importance of sentences [ J]. Information Processing and Man- agement, 2004, 40 (1): 65-79.
  • 6SALTON G, YANG C S. On the specification of term value in automatic indexing [J]. Journal of Documentation, 1973, 29 (4) : 351-372.
  • 7SATLON G, WONG A, YANG C. A vector space model for automatic indexing [ J]. Communications of ACM, 1975, 18 (11) : 613-620.
  • 8MILNE D, WITIEN I. An effective, low-cost measure of se- mantic relatedness obtained from Wikipedia links [ C ] //Pro- ceedings of the 23th Association for the Advancement of Artifi- cial Intelligence, 2008: 25-30.
  • 9WU Z, PALMER M. Verb semantics and lexical selection [ C] //Proceedings of the 32nd Annual Meeting of the Associa- tion for Computational Linguistics, New Mexico: Association for Computational Linguistics, 1994: 133-138.
  • 10张乃岳,张学燕.基于个体词语相似度的定制化动态信息检索[C].中国中文信息学会信息检索与内容安全专业委员会.第四届全国信息检索与内容安全学术会议论文集(上),2008:5.

引证文献4

二级引证文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部