期刊文献+

一类基于信息熵的多标签特征选择算法 被引量:62

Multi-Label Feature Selection Algorithm Based on Information Entropy
在线阅读 下载PDF
导出
摘要 在多标签分类问题中,特征选择是提升多标签分类器性能的一种重要手段.针对目前多标签特征选择算法计算复杂度大和无法给出一个合理的特征子集的问题,提出了一种基于信息熵的多标签特征选择算法.该算法假设特征之间相互独立,使用特征与标签集合之间的信息增益来衡量特征与标签集合之间的重要程度,并据此提出一种信息增益阈值选择方法.首先计算每一个特征与标签集合之间的信息增益,然后使用信息增益阈值选择算法得到一个合理的阈值,最后根据阈值删除不相关的特征,得到一组合理的特征子集.在2个不同分类器和4个多标签数据集上的实验结果表明:特征选择算法能够有效地提升多标签分类器的分类性能. Multi-label classification is the learning problem where each instance is associated with a set of labels. Feature selection is capable of eliminating redundant and irrelevant features in multi-label classification, which leads to performance improvement of multi-label classifiers. However the existing feature selection methods have high computation complexity and are not able to give a reasonable feature subset. Hence a novel multi-label feature selection algorithm based on information entropy is proposed in this paper. It assumes that features are independent of each other. Its main ideas are. 1) The information gain between the feature and label set is derived from the information gain between the feature and the label, and employed to measure the correlation degree between them 2) An threshold selection method is used to choose a reasonable feature subset from original features. The proposed algorithm firstly computes the information gain between each feature and label set, and then removes the irrelevant and redundant features according to the selected information gain value determined by threshold selection method. The experiment is conducted on four different datasets and two different classifiers. The experimental results and their analysis show that the proposed algorithm can effectively promote the performance of multi-label classifiers in multi-label classification.
出处 《计算机研究与发展》 EI CSCD 北大核心 2013年第6期1177-1184,共8页 Journal of Computer Research and Development
基金 "核高基"国家科技重大专项基金项目(2012ZX03005007)
关键词 物联网 数据处理 信息论 多标签分类 特征选择 信息增益 特征降维 Internet of things data processing information theory multi-label classification featureselection information gain dimensionality reduction
  • 相关文献

参考文献21

  • 1李宇峰,黄圣君,周志华.一种基于正则化的半监督多标记学习方法[J].计算机研究与发展,2012,49(6):1272-1278. 被引量:19
  • 2Tsoumakas G, Katakis I, Vlahavas I. Data Mining and Knowledge Discovery Handbook [M]. Berlin: Springer, 2010:667-685.
  • 3郑伟,王朝坤,刘璋,王建民.一种基于随机游走模型的多标签分类算法[J].计算机学报,2010,33(8):1418-1426. 被引量:58
  • 4孔祥南,黎铭,姜远,周志华.一种针对弱标记的直推式多标记分类方法[J].计算机研究与发展,2010,47(8):1392-1399. 被引量:13
  • 5Zhang Y, Zhou Z H. Multi label dimensionality reduction via dependence maximization [C] // Proe of the 2Srd AAAI Conf on Artificial Intelligence and the 20th Innovative Applications of Artificial Intelligence Conference. Menlo Park~ American Association for Artificial Intelligence, 2008: 150:3-1505.
  • 6Li G Z, You M, Ge L, et al. Feature selection for semi- supervised multi label learning with application to gene function analysis [C] // Proc of the 2010 ACM Int Conf on Bioinformatics and Computational Biology. New York: Association for Computing Machinery, 2010:354-357.
  • 7You M Y, Liu J M, Li G Z, et al. Embedded feature selection for multi-label classification of music emotions [J]. International Journal of Computational Intelligence Systems, 2012, 5(4): 668-678.
  • 8Shao H. H G. l.iu G, et al. lahel data of inquiry diagnosis Symptom selection for multi n traditional Chinese medicioe [J]. Science China Information Sciences, 2012, 54(1): 1-13.
  • 9Lee J, I.im H, Kim D W. Approximating mutual information for multi label feature selection [J].Electronics Le'tters, 2012, 48(15): 929-930.
  • 10Zhang M I., Pena J M, Rohles V. Feature selection for muhi-lahel naive Bayes classification [J].Information Seienees, 2009, 179( 19): 3218-3229.

二级参考文献68

  • 1赵世奇,张宇,刘挺,陈毅恒,黄永光,李生.基于类别特征域的文本分类特征选择方法[J].中文信息学报,2005,19(6):21-27. 被引量:21
  • 2苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:391
  • 3Schapire R E,Singer Y.Boostexter:A boosting-based system for text categorization[J].Machine Learning,2000,39(2/3):135-168.
  • 4Elisseeff A,Weston J.A kernel method for multi-labelled classification[C] //Advances in Neural Information Processing Systems.Cambridge,MA:MIT Press,2002:681-687.
  • 5Zhang M -L,Zhou Z -H.Ml-kNN:A lazy learning approach to multi-label learning[J].Pattern Recognition,2007,40(7):2038-2048.
  • 6Zhang M -L,Zhou Z -H.Multi-label neural networks with applications to functional genomics and text categorization[J].IEEE Trans on Knowledge and Data Engineering,2006,18(10):1338-1351.
  • 7周志华,张敏灵,黄圣君,等.MIML:一种从歧义对象中学习的框架,0808.3231[R].南京:南京大学软件新技术国家重点实验室,2008.
  • 8Comite F D,Gilleron R,Tommasi M.Learning multi-label alternating decision tree from texts and data[C] //Proc of the 3rd Int Conf on Machine Learning and Data Mining in Pattern Recognition.Berlin:Springer,2003:35-49.
  • 9Gao S,Wu W,Lee C -H,et al.A MFoM learning approach to robust multiclass multi-label text categorization[C] //Proc of the 21st Int Conf on Machine Learning.New York:ACM,2004:329-336.
  • 10Kazawa H,Izumitani T,Taira H,et al.Maximal margin labeling for multi-topic text categorization[C] //Advances in Neural Information Processing Systems.Cambridge,MA:MIT Press,2005:649-656.

共引文献160

同被引文献536

引证文献62

二级引证文献433

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部