摘要
微博主题的演化分析会帮助用户快速准确地理解主题脉络结构、跟踪主题发展情况,并根据主题演化做出相应的预测。本文对概率主题模型LDA(Latent Dirichlet Allocation)进行了扩展,使其适合中文微博短文本的处理,并利用LDA建模结果对微博主题进行演化分析。为了体现不同时间片中主题演化的动态性,本文在使用LDA建模之前首先对每个时间片内最优主题数目进行确定,再通过LDA主题抽取结果,追踪不同时间片内主题的变化趋势,实现主题在内容和强度两个方面的演化分析。通过在真实微博语料库上进行实验,结果表明该方法不但可以较好地分析出同一微博主题随时间的强度演化规律,还可以描述主题内容的演化趋势。
Analysis of microblog topic evolution could help users catch topic context structure, track topic development and make forecast more quickly and accurately. The paper extends the probability topic model, which is named LDA, for processing microblog short text and use the LDA result to analyze microblog topic evolution . In order to reflect the dynamic character of topic evolution in different time interval, the paper should determine the optimal number of topics for each time interval firstly, then track the topic trends in different time interval and achieve the topic evolution analysis in content and strength by the extract topics which are obtained through LDA modeling. In real microblog corpus, experiment results show that this method not only can better analyze microblog topic evolution law in intensity ,but also can describe the topic evolution trend in content.
出处
《情报学报》
CSSCI
北大核心
2013年第3期281-287,共7页
Journal of the China Society for Scientific and Technical Information
基金
教育部人文社会科学重点研究基地重大项目“面向决策的企业信息资源集成研究”(批准号:2009JJD870002)的研究成果之一