Estimation of genomic breeding values is important in genomic selection. Bayesian and BLUP methods are the main techniques employed. In this study,we conducted a comparative study of Bayes A, Bayes B,Bayes Cp and GBLU...Estimation of genomic breeding values is important in genomic selection. Bayesian and BLUP methods are the main techniques employed. In this study,we conducted a comparative study of Bayes A, Bayes B,Bayes Cp and GBLUP methods in simulated data and real data of Chinese Holstein cattle. Results showed that, in simulated data, the accuracies of all methods were all similarly elevated with the increase of reference population size, but they made different responses to the changes of marker number or QTL number. In real data of Chinese Holstein cattle, Bayes A generated the highest accuracy almost for all six traits, and GBLUP performed as well as Bayes A for the traits of milk yield, fat yield and protein yield, while for the trait of fat percentage, protein percentage and somatic cell score, three Bayesian methods showed superior to GBLUP. Comprehensively analyzing above results, it can be speculated that accuracies of the three Bayesian methods are not only influenced by the absolute value of QTL number or marker number, but may also be influenced by the ratio of QTL number to marker number. And there is at least one kind of Bayesian methods performing better than GBLUP, when the ratio of QTL number versus marker number is very small or involving large-effect QTL.展开更多
基金supported by the National Natural Science Foundation of China(3137125831272418)+10 种基金the Anhui International Technology Cooperation Plan Project(1503062014)the Anhui Academy of Agricultural Sciences President Innovation Fund Project for Outstanding Youth(13B0405)Beijing City Committee of Science and Technology Key Project(D151100004615004)the Program for Changjiang Scholar and Innovation Research Team in University(IRT1191)the Ministry of Agriculture 948 Program(2011-G2A)the National Swine Industry Technology System(CARS-36)the Anhui Swine Industry Technology System(AHCYTX-06-10)the Anhui Modern Agricultural Projectsthe Anhui Finance Project for Animal Husbandry Developmentthe Maanshan Science and Technology Plan Projects(NY-2015-01)the Anhui Academy of Agricultural Science and Technology Innovation Team Building Project(13C0405)
文摘Estimation of genomic breeding values is important in genomic selection. Bayesian and BLUP methods are the main techniques employed. In this study,we conducted a comparative study of Bayes A, Bayes B,Bayes Cp and GBLUP methods in simulated data and real data of Chinese Holstein cattle. Results showed that, in simulated data, the accuracies of all methods were all similarly elevated with the increase of reference population size, but they made different responses to the changes of marker number or QTL number. In real data of Chinese Holstein cattle, Bayes A generated the highest accuracy almost for all six traits, and GBLUP performed as well as Bayes A for the traits of milk yield, fat yield and protein yield, while for the trait of fat percentage, protein percentage and somatic cell score, three Bayesian methods showed superior to GBLUP. Comprehensively analyzing above results, it can be speculated that accuracies of the three Bayesian methods are not only influenced by the absolute value of QTL number or marker number, but may also be influenced by the ratio of QTL number to marker number. And there is at least one kind of Bayesian methods performing better than GBLUP, when the ratio of QTL number versus marker number is very small or involving large-effect QTL.
文摘为了比较自动机器学习下不同机器学习模型预测部分猪生长性状与全基因组估计育种值(genomic estimated breeding value,GEBV)的性能,并寻找适合的机器学习模型,以优化生猪育种的全基因组评估方法,本研究利用来自多个公司9968头猪的基因组信息、系谱矩阵、固定效应及表型信息通过自动机器学习方法获取深度学习(deep learning,DL)、随机森林(random forest,RF)、梯度提升机(gradient boosting machine,GBM)和极致梯度提升(extreme gradient boosting,XGB)4种机器学习最佳模型。采用10折交叉验证分别对猪达100 kg校正背膘(correcting backfat to 100 kg,B100)、达115 kg校正背膘(correcting backfat to 115 kg,B115)、达100 kg校正日龄(correcting days to 100 kg,D100)、达115 kg校正日龄(correcting days to 100 kg,D115)的GEBV及其表型进行预测,比较不同机器学习模型应用于猪基因组评估的性能。结果表明:机器学习模型对GEBV的估计准确性高于性状表型;在GEBV预测中,GBM在B100、B115、D100、D115的预测准确性分别为0.683、0.710、0.866、0.871,略高于其他方法;在表型预测中,对猪B100、B115、D100、D115预测性能最好的模型依次为GBM(0.547)、DL(0.547)、XGB(0.672、0.670);在模型训练所需时间上,RF远高于其他3种模型,GBM与DL居中,XGB所需时间最少。综上所述,通过自动机器学习获取的机器学习模型对GEBV预测的准确性高于表型;GBM模型总体上表现出最高的预测准确性与较短训练时间;XGB能够利用最短的时间训练准确性较高的预测模型;RF模型的训练时间远超其他3种模型,且准确性不足,不适用猪生长性状表型与GEBV预测。