摘要
真核基因起始与终止密码子旁侧序列的特征对于确定cDNA开放阅读框架 (ORF)和预测基因组序列中的编码区 (CDS)非常重要。基于高质量RefSeq数据库 ,在较大数据规模下统计分析了起始密码子旁侧序列所具有的“Kozak规则” ,发现不同物种之间存在差别。同时分析了不同终止密码子旁侧序列的统计学特征 ,给出了相应的正则表达式。由于发现多种基因中存在同相位起始、终止密码子串联使用的情况 ,亦对此进行了讨论。
The characters of sequence flanking the start codon and stop codon in eukaryotic genes play an important role in defining the open reading frame(ORF) in cDNA sequences and coding region (CDS) in genomic DNA sequences.Using high quality cDNA sequences of RefSeq database,the “Kozak rule” were further confirmed on a large-scale level,but with a little different between different species.At the same time,the flanking sequence characters of three stop codons were also analyzed.The regular expression for different stop codons were deduced.Moreover,the biological significance of tandem in-frame repeated start codons and stop codons was also discussed in this paper.
出处
《生物信息学》
2004年第4期10-14,共5页
Chinese Journal of Bioinformatics
基金
国家重点基础研究发展计划 (973计划 ) (2 0 0 3CB715 90 0 )
国家高技术研究发展计划 (863计划 ) (2 0 0 2AA2 3 40 2 1)资助。