期刊文献+

Spark框架下均值漂移算法对舆情聚类的分析 被引量:1

Analysis of Mean Shift Algorithm Based on Spark Framework in Public Opinion Clustering
在线阅读 下载PDF
导出
摘要 为提高对舆情信息的分析能力,设计并实现基于Spark框架的均值漂移算法。使用Ansj分词、Word2vec算法对舆情信息进行特征提取,然后基于Spark并行计算框架和均值漂移算法原理进行聚类分析。实验结果显示,均值漂移算法在Iris和Wine两组数据集下的准确率均超过90%,聚类结果明显优于K-means算法,具有较好的适应性。性能实验结果表明,增加运行程序的并行化程度可以提高均值漂移算法的运行效率。基于Spark框架的均值漂移算法能有效提高舆情信息的分析能力,助力建立健康的网络环境。 To improve the analysis ability of public opinion information,we design a mean shift algorithm based on the Spark framework.For public opinion,using the Ansj word segmentation and Word2vec algorithm feature extraction,finally clustering based on the Spark framework parallel computing model and the principle of mean shift algorithm.The numerical results show that,in both Iris and Wine data sets,the accuracy of the mean shift algorithm is over 90%,the clustering result is significantly better than the K-means algorithm,then the mean shift algorithm has better adaptability.In the performance experiment,it can effectively improve the operation efficiency of the algorithm and has better data scalability by increasing the degree of parallelization of the algorithm operation program.Therefore,the algorithm can effectively improve the analysis ability of public opinion,and help establish a healthy network environment.
作者 张京坤 王怡怡 ZHANG Jing-kun;WANG Yi-yi(Taiji Computer Corporation,China Electronics Technology Group Corporation,Beijing 100020,China;School of Mathematics and Information Science,Shaanxi Normal University,Xi’an 710100,China)
出处 《软件导刊》 2022年第6期141-146,共6页 Software Guide
关键词 舆情 SPARK 均值漂移 聚类 并行化 public opinion Spark mean shift clustering parallelization
  • 相关文献

参考文献16

二级参考文献138

共引文献497

同被引文献4

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部