摘要
随着大豆RNA基因的生物调控作用研究的不断深入,利用数据挖掘技术对大豆前体MicroRNA(pre-microRNA)进行有效的预测已成为该领域的重要发展方向。针对常规的随机森林算法在pre-microRNA预测模型中存在识别精度较低的问题,研究提出并构建基于递归特征消除(recursive feature elimination, RFE)与随机森林(random forest, RF)融合算法的大豆pre-microRNA预测模型。首先利用递归特征消除法筛选大豆pre-microRNA序列的最优特征子集;然后结合随机森林算法构建大豆pre-microRNA的预测模型;最后利用十折交叉验证法,将递归特征消除与随机森林(RFE-RF)融合模型的预测结果与单一随机森林和支持向量机分类模型的预测结果对比。研究结果表明:融合后构建的大豆pre-microRNA预测模型精度有明显提高,达到84.62%,相比于支持向量机算法(support vector machine, SVM)构建的模型精度提高了17.02%,相比于单独使用随机森林算法构建的模型精度提高了14.58%。该研究方法为大豆的pre-microRNA基因预测提供了新思路。
With the continuous in-depth research on the biological regulatory effects of small genes in soybean, the use of data mining technology to effectively predict the pre-MicroRNA of soybean has become an important development direction in this field. To solve the problem that conventional Random Forest(RF) algorithm has low recognition accuracy in pre-MicroRNA prediction model, this study proposed and constructed a soybean pre-microRNA prediction model based on Recursive Feature Elimination(RFE) and RF fusion algorithm. Firstly, we used the RFE method to select the optimal feature subset of soybean pre-MicroRNA sequences. Then, we constructed a prediction model of soybean pre-MicroRNA based on RF algorithm. Finally, we compared the prediction results of the RFE-RF fusion model with the prediction results of the single RF and Support Vector Machine(SVM) classification model. The results showed that the accuracy of the soybean Pre-MicroRNA prediction model constructed after fusion was significantly improved, reaching 84.62%, 17.02% higher than the model constructed by SVM algorithm, and 14.58% higher than the model constructed by RF algorithm alone. This method provides a new idea for the prediction of pre-MicroRNA genes in soybean.
作者
安宇
陈桂芬
李静
AN Yu;CHEN Gui-fen;LI Jing(College of Information Technology,Jilin Agricultural University,Changchun 130118,China)
出处
《大豆科学》
CAS
CSCD
北大核心
2020年第3期401-405,共5页
Soybean Science
基金
国家星火计划(2015GA660004)
吉林省重点科技研发项目(20180201073SF)。