期刊文献+

关联模式挖掘与词向量学习融合的伪相关反馈查询扩展 被引量:4

Pseudo-Relevance Feedback Query Expansion Based on the Fusion of Association Pattern Mining and Word Embedding Learning
在线阅读 下载PDF
导出
摘要 针对自然语言处理中查询主题漂移和词不匹配问题,提出基于CSC(Copulas-based Support and Confidence)框架的关联模式挖掘与规则扩展算法,并将基于统计学分析的关联模式与具有上下文语义信息的词向量融合,提出关联模式挖掘与词向量学习融合的伪相关反馈查询扩展模型.该模型对伪相关反馈文档集挖掘规则扩展词,对初检文档集进行词嵌入学习训练得到词向量,计算规则扩展词与原查询的向量相似度,提取向量相似度不低于阈值的规则扩展词作为最终扩展词.实验结果表明,所提扩展模型能有效地减少查询主题漂移和词不匹配问题,提高检索性能,与现有基于关联模式的和基于词向量的查询扩展方法比较,MAP(Mean Average Precision)平均增幅最大可达17.52%,对短查询更有效.所提挖掘方法可用于其他文本挖掘任务和推荐系统,以提高其性能. In order to solve the problems of query topic drift and word mismatch in natural language processing,an al⁃gorithm of association pattern mining and rule expansion based on CSC(Copulas-based Support and Confidence)frame⁃work is proposed.The association patterns based on statistical analysis are fused with the word embedding with context se⁃mantic information,and a pseudo-relevance feedback query expansion model is presented based on the fusion of association pattern mining and word embedding learning.In this model,the rule expansion terms are mined from the pseudo-relevance feedback document set,and the word vectors are obtained by word embedding learning training of the initial document set.The vector similarity between the rule expansion term and original query is calculated,and the rule expansion terms whose vector similarity is not lower than the threshold are extracted as the final expansion terms.The experimental results show that the proposed expansion model can effectively reduce the problems of query topic drift and word mismatch,improving the performance of information retrieval.Compared with the existing query expansion methods based on association pattern and word embedding,the average increase of the MAP(Mean Average Precision)of the proposed expansion model is up to 17.52%.The expansion model in this paper is more effective for short queries.The proposed mining method can be used in other text mining tasks and recommendation systems to improve their performance.
作者 黄名选 HUANG Ming-xuan(Guangxi Key Laboratory of Cross-border E-commerce Intelligent Information Processing,Guangxi University of Finance and Economics,Nanning,Guangxi 530003,China;School of Information and Statistics,Guangxi University of Finance and Economics,Nanning,Guangxi 530003,China)
出处 《电子学报》 EI CAS CSCD 北大核心 2021年第7期1305-1313,共9页 Acta Electronica Sinica
基金 国家自然科学基金(No.61762006)。
关键词 自然语言处理 信息检索 文本挖掘 词嵌入 查询扩展 natural language processing information retrieval text mining word embedding query expansion
  • 相关文献

参考文献8

二级参考文献42

  • 1黄名选,严小卫,张师超.查询扩展技术进展与展望[J].计算机应用与软件,2007,24(11):1-4. 被引量:53
  • 2Schwenk H. Continuous Space Language Models. Computer Speech and Language, 2007, 21 (3) : 492-518.
  • 3Bengio Y, Ducharme R, Vincent P, et al. A Neural Probabilistie Language Model. Journal of Machine Learning Research, 2003, 3 : 1137-1155.
  • 4Mikolov T, Karafiett M, Burger L, et al. Recurrent Neural Network Based Language Model//Proc of the 11 th Annual Conference of the International Speech Communication Association. Makuhari, Japan, 2010:1045-1048.
  • 5Mikolov T, Kombrink S, Burget L, et al. Extensions of Recurrent Neural Network Language Model// Proc of the IEEE International Conference on Acoustics , Speech and Signal Processing . Prague ,Czech Republic, 2011 : 5528-5531.
  • 6Bengio Y, Simard P, Frasconi P. Learning Long-Term Dependen- cies with Gradient Descent Is Difficult. IEEE Trans on Neural Net- works, 1994, 5(2): 157-166.
  • 7Son L H, Allauzen A, Yvon F. Measuring the Influence of Long Range Dependencies with Neural Network Language Models//Prec of the NAACL-HLT Workshop : Will We Ever Really Replace the N- gram Model.'? On the Future of Language Modeling for HLT. Man- treal, Canada, 2012:1-10.
  • 8Martens J, Sutskever I. Learning Recurrent Neural Networks with Hessian-Free Optimization [ EB/OL ]. [ 2014 - 02 - 10 ]. http:// www. icml-2011, org/papers/532_icmlpaper, pdf.
  • 9Sundermeyer M, Schltlter R, Ney H. LSTM Neural Networks for Lan- guage Modeling[EB/OL]. [2014-02-10]. http://www-i6, informatik. rwth- aachen, de/publications/download/820/Sundermeycr - 2012. pdf.
  • 10Shi Y, Wiggers P, Jonker C M. Towards Recurrent Neural Networks Language Models with Linguistic and Contextual Features//Proe of the 13th Annual Conference of the International Speech Communica- tion Association. Portland, USA, 2012:1664-1667.

共引文献134

同被引文献40

引证文献4

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部