摘要
密度聚类算法可以描述任意形状的聚类,可以有效地处理异常数据,适合处理大数据集,但不适用于高维数据集的聚类,因此提出了基于主成分分析的密度聚类算法,将DBSCAN算法应用于PCA的k个主成分张成的子空间,解决了DBSCAN算法用于高维数据集的问题.运用气象数据进行实验,结果表明:主成分个数k值的选择严重影响聚类效果,故提出k的基本选择方法,正确选择k值情况下,该算法具有较好的聚类效果.
Density clustering algorithm can describe any shape clustering and deal with abnormal data processing.It is suitable for large data set,but not for high dimensional data set,so density clustering algorithm based on principal component analysis is proposed.The DBSCAN algorithm is applied to the k principal component expanded subspace.The problem of the DBSCAN algorithm used for the high dimensional data set is solved.Meteorological data is used in the experiments.The experimental results show that principal component k influences clustering effect,so the selection method of k is put forward.With the correct choice of k,the proposed algorithm has good clustering effect.
出处
《天津城市建设学院学报》
CAS
2012年第1期60-62,76,共4页
Journal of Tianjin Institute of Urban Construction
关键词
聚类
密度聚类算法
主成分分析
clustering
density clustering algorithm
principal component analysis