期刊文献+

一种改进的microRNA预测模型集成方法 被引量:1

Improved Ensemble Method on MicroRNA Prediction Model
在线阅读 下载PDF
导出
摘要 现有的microRNA预测方法往往存在数据集类不平衡和适用物种单一的问题。针对以上问题,所做主要工作如下:1)提出基于序列熵的分层采样算法,该算法可在保持样本总体分布的基础上,采样生成正样本和负样本数量平衡的训练集;2)提出基于信噪比和相关性的特征选择,用于缩小训练集规模,以达到提高训练速度的目的;3)提出DS-GA算法,用于缩短SVM分类器参数的优化时间,达到减少过拟合的目的;4)结合集成学习的思想,经采样、特征选择、分类器参数优化3个步骤,建立了一种物种间通用的microRNA预测模型。实验表明,该模型有效解决了类不平衡问题,且不局限于单一物种,对混合物种的测试集预测取得了较好效果。 The existing microRNA prediction methods often present the problems of imbalance data set class and single applicable species.In order to solve the above problems,the main work is as follows.Firstly,a hierarchical sampling algorithm based on sequence entropy was proposed,which can generate a training set enhancing balance positive and negative samples based on the overall distribution of the samples.Secondly,a feature selection algorithm based on signal-tonoise ratio and correlation was designed to reduce the scale of training set and achieve the purpose of improving training speed.Thirdly,the DS-GA was proposed to shorten the optimization time of SVM classifier parameters and avoid the over-fitting problem.At last,based on the idea of ensemble learning,a common microRNA prediction model was established by sampling,feature selection and classifier parameter optimization.Experiments show that the model solves the problem of imbalance effectively,it is not limited to a single species and achieves better results for the hybrid species test set prediction.
出处 《计算机科学》 CSCD 北大核心 2018年第2期69-75,共7页 Computer Science
基金 国家自然科学基金项目(61472095)资助
关键词 MICRORNA 预测 采样 特征选择 类不平衡 MicroRNA Prediction Sampling Feature selection Imbalance class
  • 相关文献

参考文献3

二级参考文献18

  • 1李霞,张田文,郭政.一种基于递归分类树的集成特征基因选择方法[J].计算机学报,2004,27(5):675-682. 被引量:26
  • 2张文修 ,仇国芳 ,吴伟志 .粗糙集属性约简的一般理论[J].中国科学(E辑),2005,35(12):1304-1313. 被引量:37
  • 3周昉,何洁月.生物信息学中基因芯片的特征选择技术综述[J].计算机科学,2007,34(12):143-150. 被引量:20
  • 4Golub T R,Slonim D K, Tamayo P, et al. Class discovery and class prediction by gene expression monitoring[J]. Science, 1999, 286: 531-537.
  • 5Zhao Y H,Yu X J, Wang G R, et al. Maximal subspace coregulated gene clustering[J]. IEEE Transactions on Knowledge and Data Engineering, 2008, 20 (1):83-98.
  • 6Chen X W. Margin-based wrapper methods for gene identification using microarray[J].Neurocomputing, 2006,69(18) 2236-2243.
  • 7Ram6n D U, Sara A A. Gene selection and classification of microarray data using random forest[J]. BMC Bioinformatics 2006(7)t3-4.
  • 8Ma Shuangge, Song Xiao, Huang Jian. Supervised group Lasso with applications to microarray data analysls[J]. BMC Bioin- formatics, 2007(8): 60.
  • 9Chen T. Classification algorithm on gene expression profile of tumor using neighborhood rough set and support vector ma- chine[J]. Advanced Materials Research, 2014, 850: 1238-1242.
  • 10胡清华,于达仁,谢宗霞.基于邻域粒化和粗糙逼近的数值属性约简[J].软件学报,2008,19(3):640-649. 被引量:297

共引文献12

同被引文献7

引证文献1

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部