摘要
为了发现论坛数据中感兴趣的话题并对话题进行演化跟踪,文中首先利用潜在狄利克雷分配(LDA)模型将文本由词汇空间降维到主题空间,然后采用聚类算法在主题空间对文本集进行聚类,并利用文中提出的热点话题检测方法得出热点话题.基于发现的热点话题,文中提出了基于在线LDA(OLDA)话题模型的论坛热点话题演化跟踪模型(HTOLDA),该模型只选择热点话题进行先验传递,并通过设置同一话题相邻时间片的语义距离来判断话题的状态.实验结果表明,HTOLDA模型对各个时间片的论坛数据集的建模能力优于OLDA模型,并能够有效地对论坛中的热点话题进行演化跟踪.
In order to detect and track interesting topics from massive forum data,firstly,LDA( Latent Dirichlet Allocation) topic model is used to reduce the dimensionality of text data from word space to semantic space. Secondly,a clustering algorithm is employed to cluster the forum data in semantic space. Then,a detection method is proposed to obtain hot topics on the basis of which HTOLDA( Hot-Topic OLDA) topic model is proposed on the basis of OLDA( Online LDA) topic model,which performs priori delivery by choosing hot topics and sets semantic distance on the same topic of adjacent time slices to judge topic status. Experimental results show that HTOLDA topic model is superior to OLDA topic model in terms of modeling each time slice,and that it evolves and tracks the hot topics in online forums effectively.
出处
《华南理工大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2016年第5期130-136,共7页
Journal of South China University of Technology(Natural Science Edition)
基金
国家科技支撑计划项目(2012BAH18B05)
国家自然科学基金资助项目(61272447)~~