摘要
针对近邻传播(Affinity Propagation,简称AP)算法在对非团状数据集聚类过程中出现的局部聚类较多、精准度不高等问题,提出了一种基于改进AP算法的聚类质量评价模型.首先,在AP算法初步聚类的基础上,通过合并相似度较大的簇,减小聚类上限值kmax,进一步压缩聚类区间范围;其次,给出一个新的内部评价指标,用分属不同簇的样本对的平均距离代表簇间距离,削弱噪声数据的影响,平衡簇间分离度与簇内紧致度的关系.在UCI和KDD CUP99数据集上的实验结果表明,新模型可以给出精准的最优聚类数(范围),能够在保持较低漏报率的同时,有效提高样本的检测率和分类正确率.
In order to solve the problems of more local clustering and low precision of non-spherical data sets in the clustering process for Affinity Propagation algorithm,a clustering quality evaluation model based on improved AP algorithm has been proposed.Firstly,based on the initial clustering of AP algorithm,the upper limit value kmax of clustering has been reduced by merging clusters with larger similarity,and the range of clustering interval been further compressed.Secondly,a new internal evaluation index has been given with the average distance of sample pairs belonging to different clusters represents the distance between clusters,which has weakened the influence of noise data,balanced the relationship between cluster separation and cluster compactness.The experimental results on UCI and KDD CUP99 datasets show that the new model can give accurate optimal clustering number(range),and can effectively improve the detection rate and classification accuracy of samples while maintaining a low false alarm rate.
作者
邹臣嵩
段桂芹
欧阳明星
刘锋
ZOU Chen-song;DUAN Gui-qin;OUYANG Ming-xing;LIU Feng(Department of Electrical Engineering, Guangdong Songshan Polytechnic College, Shaoguan Guangdong 512126, China)
出处
《西南师范大学学报(自然科学版)》
CAS
北大核心
2020年第6期97-106,共10页
Journal of Southwest China Normal University(Natural Science Edition)
基金
广东省教育科学规划课题(2018GXJK339)。
关键词
聚类评价指标
近邻传播
内部评价指标
最优聚类数
cluster evaluation index
affinity propagation
internal evaluation index
optimal clustering number