摘要
针对多标签文本分类问题,提出基于频繁项集的多标签文本分类算法——MLFI。该算法利用FP-growth算法挖掘类别之间的频繁项集,同时为每个类计算类标准向量和相似度阈值,如果文本与类标准向量的相似度大于相应阈值则归到相应的类别,在分类结束后利用挖掘到的类别之间的关联规则对分类结果进行校验。实验结果表明,该算法有较高的分类性能。
Aiming at the problem of multi-label text classification,this paper proposes a multi-label text classification algorithm based on frequent item sets.It uses FP-growth algorithm for mining frequent item sets between labels,calculates prototype vector and similarity threshold for each class,if the similarity between prototype vector and text are greater than the corresponding threshold,then classifies the text into corresponding category.After classifying,the association rules between the class are utilized to verify the result of classification.Experimental results show that the algorithm has a higher ability of classification performance.
出处
《计算机工程》
CAS
CSCD
北大核心
2010年第15期83-85,共3页
Computer Engineering
基金
国家自然科学基金资助项目(60873100)
山西省自然科学基金资助项目(2009011017-4)
关键词
多标签
相似度
频繁项集
关联规则
multi-label
similarity
frequent item sets
association rules