摘要
【目的】分析菠萝[Ananas comosus(L.)Merr.]基因组中编码CDS的密码子使用偏好性,为了解菠萝的密码子偏好性规律和进行分子改造提供理论基础,促进植物密码子的生物学研究。【方法】以菠萝基因组测序获得的30 663条编码CDS为数据来源,应用编写的perl脚本、CUSP和SPSS软件对序列进行密码子偏好性、双联密码子以及多元统计分析。【结果】菠萝基因组数据中的编码CDS的GC平均含量为52.09%,密码子中第3位核苷酸的GC平均含量(GC3S)为55.41%,有效密码子数(ENC)取值为58.41,绝大部分的ENC值都大于35。另外,确定了34种高频密码子(RSCU值大于1),其中仅有8个以AT碱基结尾,25个以CG碱基结尾;同时确定了31种高优越表达密码子。结合以上结果,最后筛选出13种最优密码子。通过与17种植物的GC3S和密码子使用频率进行比较,发现双子叶植物与单子叶植物的GC3S和密码子使用频率存在较大差异,而菠萝较其他单子叶植物与双子叶植物更接近。【结论】从不同基因、基因内不同位置以及不同植物3个层面对菠萝密码子的偏好性进行分析,筛选出13种菠萝最优密码子。该研究有助于更好地了解菠萝密码子偏好性规律,促进植物密码子生物学研究及基因组数据在非模式植物中的潜在应用。
【Objective】Codon usage bias refers to differences in the frequency of occurrence of synonymous codons in coding DNA. A codon is a series of three nucleotides(a triplet) which encodes a specific amino acid residue in a polypeptide chain or for the termination of translation(stop codons). After a long evolution, each species forms its own codon usage patterns. Pineapple [Ananas comosus(L.) Merr.] is a nutrientdense fruit with strong consumer demand and high commericial value. However, little is known about the rules of pineapple codon usage. The aim of the present study was to investigate the pattern utilization of codons in genome sequencing data of pineapple in order to provide important guidance for genetic transformation, new gene discovery, functional gene expression regulation, protein structure and function prediction of genes, comparative genomics research with other species and molecular breeding in pineapple.【Methods】Data were obtained by JGI database, we analyzed the 30 663 genes in genome sequencing data of pineapple to study the pattern utilization of codons by perl script, and SPSS bioinformatics softwares, by which CG, Effective number of codon(ENC), Relative synonymous codon usage(RSCU) and double codon werecaculated. The RSCU value was the relative probability of a codon encoding the same amino acid for a particular codon. In the absence of codon usage preference, the RSCU of each synonymous codon was 1.When the RSCU of a codon was over 1, the codon was defined as a high frequency codon, indicating that the codon had a higher frequency of use in a synonymous codon and that the gene had a preference for the codon. The ENC value described the degree to which codon usage is deviated from random selection. ENC could reflect the degree of preference for synonymous codon usage in the codon family. The smaller the ENC value was, the higher the expression level of the corresponding endogenous gene was. According to the size of the ENC of each gene, the values of RSCU of the genes in high and low expression levels were obtained. If the RSCU difference between the high and low expression genes was over 0.08, then the corresponding codon for the amino acid was determined to be a high-expression superior codon. If the codon was simultaneously determined to be a high frequency codon and a high expression superior codon, the codon was the optimal codon. The pineapple genes were imported into CUSP software for calculation, and then the codon usage frequencies were obtained. The genome data of Carica papaya, Glycine max, Arabidopsis thaliana, Ricinus communis, Prunus mume, Prunus persica, Cucumis sativus, Cajanus cajan, Oryza sativa, Brassica rapa, Carica papaya, Citrus sinensis, Brachypodium distachyo, Populus trichocarpa, Theobroma cacao, Vitis vinifera, Sorghum bicolor and Zea mays were searched through the JGI database. The gene codon usage frequencies of pineapple were compared with those of other species. If the difference of the frequencies between two species were in the range of 0.5-2.0, the codon preference of the two species was relatively close.【Results】The GC content of pineapple genes was 52.09%, the GC content in the third positions was 55.41%, which indicated the GC3Scontent(the GC content of the third nucleotide of synonymous codon) of pineapple genes had no obvious codon usage bias(CUB). The ENC of whole genes was58.41, the majority of the ENC values were over 35, indicating that the pineapple transcriptome gene CUB was weak. In addition, it was determined that the RSCU of the 34 codons was over 1, they were defined as high frequency codons(CTC, TTG, CTT, AGG, CGC, AGA, CGG, TCC, TCT, AGC, TCG, GTG, GTT, GTC,GGC, GGG, ACC, ACT, CCT, CCG, CCC, ATC, ATT, GCC, GCG, TGC, AAG, GAG, TTC, GAT, TAC,CAG, CAC, AAT), only 8 of them ended with AT base and 25 of them ended with GC base, which indicated tthat the pineapple gene codons preferred to the end of C or G, at the same time. 31 high-quality expression codons were obtained through analysis, 13 optimal codons were identified on the above basis. They were AGG, AGA, TCT, CTT, TTG, GTT, CCT, ACT, ATT, GAT, AAT, TTT and TAT. In addition, we also analyzed the sequence of codons with 20 amino acid pair codons. We found that the codon usage patterns of the monocotyledons plants gene were greatly different from those of the dicotyledonous plant genes through comparison with other 17 specise, pineapple is closer to dicotyledonous plants.【Conclusion】Eighteen optimal codons were selected through the analysis of codon bias of Ananas comosus, which would provide a basis for gene optimization and prediction of some function unknown genes in pineapple.
出处
《果树学报》
CAS
CSCD
北大核心
2017年第8期946-955,共10页
Journal of Fruit Science
基金
国家自然科学基金(31260460)
海南省重点研发项目(ZDYF2016035)