期刊文献+

基于MapReduce的并行Agnes算法

Parallel Agnes Algorithm based on MapReduce
在线阅读 下载PDF
导出
摘要 针对传统的Agnes算法在处理大批量数据时出现的内存和CPU处理速度问题,提出基于Map Reduce框架的并行Agnes算法,给出了算法的主要设计方案。Map阶段主要进行簇的初始化步骤,Reduce阶段则计算簇间距离,合并距离最近的簇。为了考虑属性间的联系,在计算簇间距离时,使用马氏距离替代欧氏距离。最后使用大小不同的数据集验证改进算法的加速比和可伸缩性。实验结果表明基于Map Reduce框架的并行Agnes算法适合于大批量数据的分析和挖掘。 In order to solve the problem of memory capacity and CPU processing speed when the traditional Agnes algorithm is used to deal with massive data. A parallel Agnes algorithm based on mapreduee was proposed. And concrete method was also described. The process of the Map's aim is to get initialized clusters. The process of the Reduce is to calculate distance between clusters,merge the most closed clusters. And concerning the connection of Attributes,the thesis replaced Euclidean Distance with Mahalanobis Distance. At last, using different size of dataset to test speedup ratio and sealabilily of improved algorithm .The experimental result show that improved algorithm is suitable for massive data analysis and data mining.
作者 张国光 巩秀钢 于旭东 冯韶文 ZHANG Guo- guang;GONG Xiu- gang;YU Xu- along;FENG Shao- wen(School of Computer Science and Technology,Shandong University of Technology, Zibo Shandong 255049,Chin)
出处 《科技视界》 2018年第10期113-115,共3页 Science & Technology Vision
关键词 MaprReduce 并行Agnes 大批量数据 马氏距离 MapReduce Parallel Agnes Massive data Mahalanobis Distance
  • 相关文献

参考文献3

二级参考文献24

  • 1朱惠倩.聚类分析的一种改进方法[J].湖南文理学院学报(自然科学版),2005,17(3):7-9. 被引量:15
  • 2刘远超,王晓龙,刘秉权.一种改进的k-means文档聚类初值选择算法[J].高技术通讯,2006,16(1):11-15. 被引量:23
  • 3田彦山.基于山峰聚类的聚类上限确定方法[J].江西师范大学学报(自然科学版),2007,31(2):134-137. 被引量:2
  • 4HanJiawei,KamberM.数据挖掘概念与技术[M].范明,盂晓峰,译.2版.北京:机械工业出版社,2006.
  • 5Xiang Shiming, Nie Feiping, Zhang Changshui. Leafing a maha- lanobis distance metric for data clustering and classification [J]. Pattern Recognition, 2008, 42(12): 3600-3612.
  • 6KDD99 Cup Dataset [EB/OL]. [2011-12-11]. http://kdd, ics. uci. edu/databases/kddcup99/kddcup99.html.
  • 7Mukkamala S, Janoski G, Sung A H. Intrusion detection using suppoa vector machines and neural networks [EB/OL]. [2011-12- 20]. http://www, cs. uiuc. edu/class/fa05/cs591han/papers/mukk CNN02.pdf.
  • 8Han Jiawei,Kamber M.Data mining:concepts and tech- niques[M].San Francisco:Morgan Kaufmann Publishers, 2000.
  • 9Januzaj E, Kriegel H P, Pfeifle M.DBDC : Density-Based Distributed Clustering[C]//Proceedings of 9th International Conference on Extending Database Technology(EDBT). Oakland: IEEE Computer Press, 2004 : 88-105.
  • 10Samatova N F, Ostrouchov G.RACHET : an efficient cov- er-based merging of clustering hierarchies from distribut- ed datasets[J].Distributed and Parallel Databases,2002, 11 (2) : 157-180.

共引文献76

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部