期刊文献+

基于LDA主题关联过滤的领域主题演化研究 被引量:27

Topic Evolution Research on a Certain Field Based on LDA Topic Association Filter
原文传递
导出
摘要 【目的】发现领域文献中主题的新生、消亡、继承、分裂和合并的演化轨迹。【方法】根据文献出版时间划分多个时间窗口,通过LDA主题模型识别各个时间窗口中的主题;利用主题关联(Topic Association)过滤规则确定相邻时间窗口主题间的演化关系;形成连续时间段内主题新生、消亡、继承、分裂和合并的演化轨迹。【结果】在保证主题延续性的条件下,更准确地识别主题的新生、消亡、继承、分裂和合并的演化类型。【局限】固定的时间窗口,未考虑主题演化周期的多样性。【结论】该方法可以有效降低LDA主题模型中相似度较小主题的干扰,提升主题演化关系识别的准确性。 [Objective] To detect the birth, extinction, development, merge and split of topic evolution of the literatures in a certain field. [Methods] This paper divides time windows according to the publication data of the literatures, and LDA model is applied to extract topics from each time window automatically. The topic association filter rules are used to determine evolution relationships between topics in adjacent time windows. Form a topic evolution path in a continuous time period. [Results] Considering the continuity of the topics, different types of topic evolution could be detected with high accuracy. [Limitations] This method fixes the size of time windows without considering the diversity of topic evolution cycles. [Conclusions] This method can effectively reduce the interference of topics with smaller similarity in LDA, and enhance accuracy of evolution relation recognition.
出处 《现代图书情报技术》 CSSCI 2015年第3期18-25,共8页 New Technology of Library and Information Service
基金 国家科技支撑计划子课题"基于文献知识网络的领域学术关系研究与示范"(项目编号:2011BAH10B06-04)的研究成果之一
关键词 主题关联 主题演化 主题模型LDA Topic association Topic evolution Topic model LDA
  • 相关文献

参考文献10

二级参考文献125

  • 1于满泉,骆卫华,许洪波,白硕.话题识别与跟踪中的层次化话题识别技术研究[J].计算机研究与发展,2006,43(3):489-495. 被引量:49
  • 2石晶,戴国忠.基于PLSA模型的文本分割[J].计算机研究与发展,2007,44(2):242-248. 被引量:25
  • 3谭松波,王月粉.中文文本分类语料库-TanCorpv1.0[EB/OL].(2007-08-29)[2008-01-20].http://www.searehforum:org.cn/tansongbo/corpus.htm.
  • 4Thomas Hofmann. Probabilistic latent semantic indexing[C]//Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Berkeley, CA, USA, 1999,50-57.
  • 5David M. Blei, Andrew Y. Ng, Michael I. Jordan. Latent dirichlet allocation[J]. The Journal of Machine Learning Research,2003,3: 993-1022.
  • 6T. Griffiths,M. Steyvers. A probabilistic approach to semantic representation [C]//Proceedings of the 24th Annual Conference of the Congnitive Science Society. Mahwah, NJ : Erlbaum, 2002,381-386.
  • 7M. Steyvers,T. Griffiths. Probabilistic topic models In: T. Landauer, D. S. McNamara, S. Dennis, W Kintsch (Eds.), handbook of Latent Semantic Analysis[M]. Hillsdale, NJ.. Erlbaum. 2007.
  • 8X. Wang, A. McCallum. Topic over time: A non-mark ov continuous-time model of topical trends[C]//Pro ceedings of the 12th ACM SIGKDD International Con ference on Knowledge Discovery and Data Mining Philadelphia, PA, USA, 2006: 424-433.
  • 9D. HalI,D. Jurafsky,C. D. Manning. Studying the history of ideas using topic models[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Honolulu, Hawaii, 2008,363-371.
  • 10D. M. Blei,J. D. Lafferty. Dynamic topic model[C]// Proceedings of the 23rd International Conference on Machine Learning. Pittsburgh, Pennsylvania, 2006 : 113-120.

共引文献288

同被引文献343

引证文献27

二级引证文献258

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部