期刊文献+

基于改进聚类算法的网络舆情分析系统研究 被引量:14

Research and Implementation of Desktop Search Engine Based on Tika and Lucene
在线阅读 下载PDF
导出
摘要 针对互联网舆情挖掘领域的特点,提出了一种基于向量空间模型VSM的文本聚类算法STCC(Similarity Threshold Control Clustering BasedVSM)。该算法按照层次聚类从下至上凝聚的策略,获取初始簇信息,然后根据K-means算法的思想以设置的聚类相似度阈值作为度量来合并簇。该算法结合层次聚类和K-means算法的优点,克服其缺点。与层次聚类相比,每一次聚类时不需要比较所有簇之间的相似度,降低了时间复杂度,提高了聚类的效率;与K—means算法相比,不需要确定K值,灵活性更高。通过实验表明,该算法聚类效果好,实用性高,适合大规模的文本聚类。 By analyzing the existed clustering algorithms, a new text clustering algorithm, which uses similarity threshold control clustering based VSM (STCC) , is proposed in this paper. The algorithm is based on the hierarchical clustering bottom to top strategy to get the information of primary clusters and can merge clusters in a threshold of clustering similarity according to K-means. The algorithm overcomes the shortcomings of calculating the similarity in all clusters with every clustering and pre-determining the value K. The experimental results show that the algorithm can reduce the time complexity, improve the clustering efficiency, is more flexible and more applicable.
出处 《情报学报》 CSSCI 北大核心 2014年第5期530-537,共8页 Journal of the China Society for Scientific and Technical Information
基金 国家自然科学基金项目(61373161) 北京市属高等学校人才强教深化计划“中青年骨干人才”项目(PHR201008083)资助
关键词 互联网舆情 数据挖掘 关键词提 取文本聚类 internet public opinion, data mining, keywords extraction, text clustering
  • 相关文献

参考文献20

二级参考文献145

共引文献248

同被引文献178

  • 1张振亚,王进,程红梅,王煦法.基于余弦相似度的文本空间索引方法研究[J].计算机科学,2005,32(9):160-163. 被引量:55
  • 2邹娟,周经野,邓成,高南莎.特征词提取中同义处理的新方法[J].中文信息学报,2005,19(6):44-49. 被引量:10
  • 3冯少荣,肖文俊.基于密度的DBSCAN聚类算法的研究及应用[J].计算机工程与应用,2007,43(20):216-221. 被引量:34
  • 4孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008(1):48-61. 被引量:1082
  • 5Abdul-Mageed M M. Online news sites and journalism 2. 0 : Reader comments on A1 Jazeera Arabic [ J ]. tripleC : Communication, Capitalism & Critique. Open Access Journal for a Global Sustainable Information Society, 2008, 6 ( 2 ) : 59-76.
  • 6Liu Q, Zhou M, Zhao X. Understanding News 2.0: A framework for explaining the number of comments from readers on online news [ J ] . Information & Management, 2015, 52(7) : 764-776.
  • 7Walther J B, DeAndrea D, Kim J, et al. The influence of online comments on perceptions of antimarijuana public service announcements on YouTube [ J ]. Human Communication Research, 2010, 36 (4) : 469-492.
  • 8Houston J B, Hansen G J, Nisbett G S. Influence of user comments on perceptions of media bias and third-person effect in online newsEJ~. Electronic News, 2011, 5(2) : 79 -92.
  • 9Saha S K. Person Specific Comment Extraction and Classification [ D ]. Jadavpur University Kolkata, 2012.
  • 10Zhuang L, Jing F, Zhu X Y. Movie review mining and summarization [ C ]//Proceedings of the 15th ACM international conference on Information and knowledge management. ACM, 2006: 43-50.

引证文献14

二级引证文献77

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部