摘要
本文提出了一种新的文档软聚类算法。将关键字通过文档的题名、摘要进行映射扩展,并对关键字的出现位置进行加权构造文本向量空间。利用模糊最大支撑树聚类过程中类间和类内相似度变化的规律自动识别最佳聚类数K及硬聚类簇。以硬聚类簇为核心将聚类相似度减小到下相似度进行扩展,从而形成相应软聚类。实验表明该算法能够有效地降低特征维数、提高软聚类精度和速度。
Author presents a new algorithm for Document soft Clustering. Extract keywords from the title and abstract and construct a weighted document vector space according to the position of the keywords. Automatically determine the optimal classification number K and hard cluster by applying the law of simi- larity-change inside and between classes in the process of maximum spanning tree clustering. Centering on hard cluster, decrease the cluster similarity to the minimum to form the soft clustering. Experimental result indicates a great drop in feature dimension and an increase in speed and accuracy.
出处
《贵州大学学报(自然科学版)》
2007年第2期175-178,共4页
Journal of Guizhou University:Natural Sciences
关键词
科技文献
特征提取
相似度
软聚类
Science Documents
feature extraction
similarity measures
soft clustering