摘要
人类基因启动子识别是医学研究的基本需要。提取DNA序列碱基的PZ曲线特征、二核苷酸空间结构特征、保守信号似然得分,以及K联体似然得分,结合GC含量变化和非均匀指数,构建基于粒子群优化的支持向量机算法来识别人类基因启动子。利用粒子群优化支持向量机参数进行优化避免了人为选择的随机性,并且在分类问题中表现出较好的稳健性。对测试集的10-折交叉检验结果为:敏感性为92%,特异性为91%,马修斯关联系数为0.83。该结果表明,基于粒子群优化的支持向量机算法能有效识别启动子序列。
Recognition of gene promoters in human beings is a basic requirement for medical research. It was achieved through analysis of phase-specific PZ curves of nucleotide, spatial structure of nucleotide, conservative signal and K-mer likelihood score in DNA sequence, as well as GC content changes and in-homogeneity index. The support vector machine algorithm based-particle swarm optimization was proposed to identify human gene promoters. Using PSO algorithm to optimize the parameters of SVM can avoid the randomness of artificial selec- tion and present better robustness in classification. The sensitivity, specificity and MCC tested by the 10-fold cross-validation were 92%, 91%, and 0.83, respectively. The result indicated that PSO-SVM method can be used to effectively identify promoter sequences.
出处
《安徽农业大学学报》
CAS
CSCD
北大核心
2015年第2期310-315,共6页
Journal of Anhui Agricultural University
基金
教育部博士点基金(20100097110040)
中央高校基本科研业务费专项资金(KYZ201125)
江苏省自然科学基金(BK20140676
BK20141358)共同资助
关键词
相位特异PZ曲线
粒子群优化
支持向量机
启动子预测
phase-specific PZ curve
particle swarm optimization (PSO)
support vector machine (SVM)
promoter prediction