期刊文献+

基于潜在狄利克雷分配模型的微博主题演化分析 被引量:28

Analysis of Microblog Topic Evolution Based on Latent Dirichlet Allocation Model
在线阅读 下载PDF
导出
摘要 微博主题的演化分析会帮助用户快速准确地理解主题脉络结构、跟踪主题发展情况,并根据主题演化做出相应的预测。本文对概率主题模型LDA(Latent Dirichlet Allocation)进行了扩展,使其适合中文微博短文本的处理,并利用LDA建模结果对微博主题进行演化分析。为了体现不同时间片中主题演化的动态性,本文在使用LDA建模之前首先对每个时间片内最优主题数目进行确定,再通过LDA主题抽取结果,追踪不同时间片内主题的变化趋势,实现主题在内容和强度两个方面的演化分析。通过在真实微博语料库上进行实验,结果表明该方法不但可以较好地分析出同一微博主题随时间的强度演化规律,还可以描述主题内容的演化趋势。 Analysis of microblog topic evolution could help users catch topic context structure, track topic development and make forecast more quickly and accurately. The paper extends the probability topic model, which is named LDA, for processing microblog short text and use the LDA result to analyze microblog topic evolution . In order to reflect the dynamic character of topic evolution in different time interval, the paper should determine the optimal number of topics for each time interval firstly, then track the topic trends in different time interval and achieve the topic evolution analysis in content and strength by the extract topics which are obtained through LDA modeling. In real microblog corpus, experiment results show that this method not only can better analyze microblog topic evolution law in intensity ,but also can describe the topic evolution trend in content.
出处 《情报学报》 CSSCI 北大核心 2013年第3期281-287,共7页 Journal of the China Society for Scientific and Technical Information
基金 教育部人文社会科学重点研究基地重大项目“面向决策的企业信息资源集成研究”(批准号:2009JJD870002)的研究成果之一
关键词 LDA模型 主题演化 JS距离 微博 LDA model, topic evolution, JS distance, microblog
  • 相关文献

参考文献9

  • 1Blei D M, Lafferty J D. Dynamic topic model [ C ]// Proceedings of the 23rd International Conference on Machine Learning. Pittsburgh, Pennsylvania, 2006: 113-120.
  • 2Griffiths T L, Steyvers M. Finding scientific topics [ C ]//Proceeding of the National Academy of Science of United States of America. 2004,101:5228-5235.
  • 3Alsumait L, Barbara D, Domenieoni C. On-line LDA: Adaptive topic models of mining text streams with applications to topic detection and tracking [ C ]// Proceeding of the 8th IEEE International Conference on Data Mining. Washington, DC, USA: IEEE Computer Society, 2008 : 3-12.
  • 4Griffith T, Steyvers M. Probabilistic topic models [ G ]// Latent Semantic Analysis: A Road to Meaning. Hillsdale, NJ: Laurence Erlbaum,2006.
  • 5Blei D M,Ng A Y,Jordan M I. Latent dirichlet allocation [J]. Journal of Machine Learning Research, 2003, 3(4- 5 ) :993-1022.
  • 6Zhang Huaping,Yu Hongkui,Xiong Deyi, et al. HHMM- based Chinese lexical analyzer ICTCLAS [ C ]// Proceedings of the second SIGHAN workshop affiliated with 41s' ACL . Sapporo, Japan, July 2003:184-187.
  • 7Cao Juan, Xia Tian, Li Jintao, et al. A density based method for adaptive LDA model selection [ J ]. Neuro computing, 2009,72 ( 7-9 ) : 1775 -1781.
  • 8Geman S, Geman D. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images [ J ]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1984,6 (6) :721-741.
  • 9单斌,李芳.基于LDA话题演化研究方法综述[J].中文信息学报,2010,24(6):43-49. 被引量:87

二级参考文献24

  • 1Thomas Hofmann. Probabilistic latent semantic indexing[C]//Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Berkeley, CA, USA, 1999,50-57.
  • 2David M. Blei, Andrew Y. Ng, Michael I. Jordan. Latent dirichlet allocation[J]. The Journal of Machine Learning Research,2003,3: 993-1022.
  • 3T. Griffiths,M. Steyvers. A probabilistic approach to semantic representation [C]//Proceedings of the 24th Annual Conference of the Congnitive Science Society. Mahwah, NJ : Erlbaum, 2002,381-386.
  • 4M. Steyvers,T. Griffiths. Probabilistic topic models In: T. Landauer, D. S. McNamara, S. Dennis, W Kintsch (Eds.), handbook of Latent Semantic Analysis[M]. Hillsdale, NJ.. Erlbaum. 2007.
  • 5X. Wang, A. McCallum. Topic over time: A non-mark ov continuous-time model of topical trends[C]//Pro ceedings of the 12th ACM SIGKDD International Con ference on Knowledge Discovery and Data Mining Philadelphia, PA, USA, 2006: 424-433.
  • 6D. HalI,D. Jurafsky,C. D. Manning. Studying the history of ideas using topic models[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Honolulu, Hawaii, 2008,363-371.
  • 7D. M. Blei,J. D. Lafferty. Dynamic topic model[C]// Proceedings of the 23rd International Conference on Machine Learning. Pittsburgh, Pennsylvania, 2006 : 113-120.
  • 8L. Alsumait,D. Barbara,C. Domeniconi. On-line LDA : Adaptive topic models of mining text streams with applications to topic detection and tracking[C]//Proceeding of the 8th IEEE International Conference on Data Mining. Washington,DC, USA : IEEE Computer Society,2008:3-12.
  • 9楚克明.基于LDA新闻话题的演化[C]//第五届全国信息检索学术会议.上海,中国,2009:64-72.
  • 10A. Gohr, A. Hinnerburg, R. Schult, M. Spiliopoulou. Topic evolution in a stream of documents[C]//Proceeding of the Society for Industrial and Applied Mathematics. 2009 : 859-870.

共引文献86

同被引文献378

引证文献28

二级引证文献277

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部