Apricots,scientifically known as Prunus armeniaca L,are drupes that resemble and are closely related to peaches or plums.As one of the top consumed fruits,apricots are widely grown worldwide except in Antarctica.A hig...Apricots,scientifically known as Prunus armeniaca L,are drupes that resemble and are closely related to peaches or plums.As one of the top consumed fruits,apricots are widely grown worldwide except in Antarctica.A high-quality reference genome for apricot is still unavailable,which has become a handicap that has dramatically limited the elucidation of the associations of phenotypes with the genetic background,evolutionary diversity,and population diversity in apricot.DNA from P.armeniaca was used to generate a standard,size-selected library with an average DNA fragment size of~20 kb.The library was run on Sequel SMRT Cells,generating a total of 16.54 Gb of PacBio subreads(N50=13.55 kb).The high-quality P.armeniaca reference genome presented here was assembled using long-read single-molecule sequencing at approximately 70×coverage and 171×Illumina reads(40.46 Gb),combined with a genetic map for chromosome scaffolding.The assembled genome size was 221.9 Mb,with a contig NG50 size of 1.02 Mb.Scaffolds covering 92.88%of the assembled genome were anchored on eight chromosomes.Benchmarking Universal Single-Copy Orthologs analysis showed 98.0%complete genes.We predicted 30,436 protein-coding genes,and 38.28%of the genome was predicted to be repetitive.We found 981 contracted gene families,1324 expanded gene families and 2300 apricot-specific genes.The differentially expressed gene(DEG)analysis indicated that a change in the expression of the 9-cis-epoxycarotenoid dioxygenase(NCED)gene but not lycopene beta-cyclase(LcyB)gene results in a lowβ-carotenoid content in the white cultivar“Dabaixing”.This complete and highly contiguous P.armeniaca reference genome will be of help for future studies of resistance to plum pox virus(PPV)and the identification and characterization of important agronomic genes and breeding strategies in apricot.展开更多
MicroRNAs (miRNAs) are endogenous 22-nt RNAs, which play important regulatory roles by post-transcriptional gene silencing. A computational strategy has been developed for the identification of conserved miRNAs base...MicroRNAs (miRNAs) are endogenous 22-nt RNAs, which play important regulatory roles by post-transcriptional gene silencing. A computational strategy has been developed for the identification of conserved miRNAs based on features of known metazoan miRNAs in red flour beetle (Tribolium castaneum), which is regarded as one of the major laboratory models of arthropods. Among 118 putative miRNAs, 47% and 53% of the predicted miRNAs from the red flour beetle are harbored by known protein-coding genes (intronic) and genes located outside (intergenic miRNA), respectively. There are 31 intronic miRNAs in the same transcriptional orientation as the host genes, which may share RNA polymerase II and spliceosomal machinery with their host genes for their biogenesis. A hypothetical feed-back model has been proposed based on the analysis of the relationship between intronic miRNAs and their host genes in the development of red flour beetle.展开更多
Maize is a globally important crop that was a classic model plant for genetic studies. Here, we report a 2.2 Gb draft genome sequence of an elite maize line, HuangZaoSi (HZS). Hybrids bred from HZS-improved lines (HIL...Maize is a globally important crop that was a classic model plant for genetic studies. Here, we report a 2.2 Gb draft genome sequence of an elite maize line, HuangZaoSi (HZS). Hybrids bred from HZS-improved lines (HILs) are planted in more than 60% of maize fields in China. Proteome clustering of six completed sequeneed maize genomes show that 638 proteins fall into 264 HZS-specific gene families with the majority of contributions from tandem duplication events. Resequencing and comparative analysis of 40 HZSrelated lines reveals the breeding history of HILs. More than 60% of identified selective sweeps were clustered in identity.by.descent conserved regions, and yield-related genes/QTLs were enriched in HZS characteristic selected regions. Furthermore, we dem on strated that HZS-specific family genes were not uniformly distributed in the genome but enriched in improvement/function.related genomic regions. This study provides an important and novel resource for maize genome research and expands our knowledge on the breadth of genomic variation and improvement history of maize.展开更多
Here, we evaluate the contribution of two major biological processes--DNA replication and transcription--to mutation rate variation in human genomes. Based on analysis of the public human tissue transcriptomics data, ...Here, we evaluate the contribution of two major biological processes--DNA replication and transcription--to mutation rate variation in human genomes. Based on analysis of the public human tissue transcriptomics data, high-resolution replicating map of Hela cells and dbSNP data, we present significant correlations between expres- sion breadth, replication time in local regions and SNP density. SNP density of tissue-specific (TS) genes is sig- nificantly higher than that of housekeeping (HK) genes. TS genes tend to locate in late-replicating genomic re- gions and genes in such regions have a higher SNP density compared to those in early-replication regions. In addi- tion, SNP density is found to be positively correlated with expression level among HK genes. We conclude that the process of DNA replication generates stronger mutational pressure than transcription-associated biological processes do, resulting in an increase of mutation rate in TS genes while having weaker effects on HK genes. In contrast, transcription-associated processes are mainly responsible for the accumulation of mutations in highly-expressed HK genes.展开更多
To enrich the genomic information of the commercially important fish species, we obtained 5,063 high-quality expressed sequence tags (ESTs) from the muscle cDNA database of the mandarin fish (Siniperca chuatsi). C...To enrich the genomic information of the commercially important fish species, we obtained 5,063 high-quality expressed sequence tags (ESTs) from the muscle cDNA database of the mandarin fish (Siniperca chuatsi). Clustering analysis yielded 1,625 unique sequences including 443 contigs (from 3,881 EST sequences) and 1,182 sin- gletons. BLASTX searches showed that 959 unique sequences shared homology to proteins in the NCBI non-redundant database. A total of 740 unique sequences were functionally annotated using Gene Ontology. The 1,625 unique sequences were assigned to Kyoto Encyclopedia of Genes and Genomes reference pathways, and the results indicated that transcripts participating in nucleotide metabolism and amino acid metabolism are relatively abundant in S. chuatsi. Meanwhile, we identified 15 genes to be abundantly expressed in muscle of the mandarin fish. These genes are involved in muscle structural formation and regulation of muscle differentiation and development. The most remarkable gene in S. chuatsi is nuclease diphosphate kinase B, which is represented by 449 EST sequences accounting for 8.86% of the total EST sequences. Our work provides a transcript profile expressed in the white muscle of the mandarin fish, laying down a foundation in better understanding of fish genomics.展开更多
Bacillus thuringiensis(B.thuringiensis) is a soil-dwelling Gram-positive bacterium and its plasmid-encoded toxins(Cry) are commonly used as biological alternatives to pesticides.In a pangenomic study,we sequenced ...Bacillus thuringiensis(B.thuringiensis) is a soil-dwelling Gram-positive bacterium and its plasmid-encoded toxins(Cry) are commonly used as biological alternatives to pesticides.In a pangenomic study,we sequenced seven B.thuringiensis isolates in both high coverage and base-quality using the next-generation sequencing platform.The B.thuringiensis pangenome was extrapolated to have 4196 core genes and an asymptotic value of 558 unique genes when a new genome is added.Compared to the pangenomes of its closely related species of the same genus,B.thuringiensis pangenome shows an open characteristic,similar to B.cereus but not to B.anthracis;the latter has a closed pangenome. We also found extensive divergence among the seven B.thuringiensis genome assemblies,which harbor ample repeats and single nucleotide polymorphisms(SNPs).The identities among orthologous genes are greater than 84.5%and the hotspots for the genome variations were discovered in genomic regions of 2.3-2.8 Mb and 5.0-5.6 Mb.We concluded that high-coverage sequence assemblies from multiple strains, before all the gaps are closed,are very useful for pangenomic studies.展开更多
The Human Genome Project(HGP)has paved the way for the Digital Personal Genomes(DPG),whereby a person’s complete genome sequence serves as the primary entry for the Digital Healthcare Systems(DHS).If the goal is to d...The Human Genome Project(HGP)has paved the way for the Digital Personal Genomes(DPG),whereby a person’s complete genome sequence serves as the primary entry for the Digital Healthcare Systems(DHS).If the goal is to deliver PDG on demand,affordability,mainly the cost,becomes one of the primary issues.However,as a once-in-a-lifetime event,a total cost of US$100 is close to$1 per year contribution.Therefore,what remains is still largely an engineering challenge,since a multi-fold reduction of the current per-genome sequencing cost may be achievable by increasing the scale of operation and the degree of automation over 5–10 years.The second issue is to differentiate scientific achievements from what is applicable to the healthcare systems and human well-being in general.The original thought of HGP is to understand the genetics of cancers—an idea or a proposal was made and debated in the early 1980s,and it is still clear that we need both high-quality genome sequences and time to fully understand their encoded biological information;the success of HGP lies on the separation of these two goals.展开更多
An altered pattern of epigenetic modifications, such as DNA methylation and histone modification, is critical to many common human diseases, including cancer. Recently, mitochondrial DNA (mtDNA) was reported to be a...An altered pattern of epigenetic modifications, such as DNA methylation and histone modification, is critical to many common human diseases, including cancer. Recently, mitochondrial DNA (mtDNA) was reported to be associated with tumorigenesis through epigenetic regulation of methylation patterns. One of the promising approaches to study DNA methylation and CpG islands (CGIs) is sequencing and analysis of clones derived from the physical library generated by methyl-CpG-binding domain proteins and restriction enzyme MseI. In this study, we observed that the most redundant sequences of 349 clones in a human CGI library were all generated from the human mitochondrial genome. Further analysis indicated that there was a 5,845-bp DNA transfer from mtDNA to chromosome 1, and all the clones should be the products of a 510-bp MseI fragment, which contained a putative CGI of 270 bp. The 510-bp fragment was annotated as part of cytochrome c oxidase subunit II (COXII), and phylogenetic analysis of homologous sequences containing COXII showed three DNA transfer events from mtDNA to nuclear genome, one of which underwent secondary transfer events between different chromosomes. These results may further our understanding of how the mtDNA regulates DNA methylation in the nucleus.展开更多
In order to identify the genes associated with glioblastoma differentiation, some ESTs, expressed differentially in the control cell and the differentiated human glioblastoma cell line BT-325 induced by the all-trans ...In order to identify the genes associated with glioblastoma differentiation, some ESTs, expressed differentially in the control cell and the differentiated human glioblastoma cell line BT-325 induced by the all-trans retinoid acid, have been isolated by the method of DDRT-PCR. Of the 46 ESTs sequenced, 19 are from new genes. A full-length 1 535-bp cDNA, termed gene GDR1, has been isolated from the human cDNA library using the probe designed according to one of the novel ESTs, HGBB098. The open reading frame of GDR1 gene encodes a putative protein containing 334 amino acid residues. Blast against the current GenBank DMA and protein sequence database did not reveal significant homology with any known proteins. RT-PCR shows that GDR1 mRNA level increased in the differentiated BT-325 cells after being treated with RA. The different expression patterns of GDR1 mRNA in human tissues have been detected through the multiple tissue Northern blot hybridization.展开更多
Since the human genome is mostly transcribed, genetic variations must exhibit sequence signatures reflecting the relationship between transcription processes and chromosomal structures as we have observed in unicellul...Since the human genome is mostly transcribed, genetic variations must exhibit sequence signatures reflecting the relationship between transcription processes and chromosomal structures as we have observed in unicellular or- ganisms. In this study, a set of 646 ubiquitous expression-invariable genes (EIGs) which are present in germline cells were defined and examined based on RNA-sequencing data from multiple high-throughput transcriptomic data. We demonstrated a relationship between gene expression level and transcript-centric mutations in the human genome based on single nucleotide polymorphism (SNP) data. A significant positive correlation was shown be- tween gene expression and mutation, where highly-expressed genes accumulate more mutations than low- ly-expressed genes. Furthermore, we found four major types of transcript-centric mutations: C---~T, A---~G; C---~ and G--~T in human genomes and identified a negative gradient of the sequence variations aligning from the 5' end to the 3' end of the transcription units (TUs). The periodical occurrence of these genetic variations across TUs is associated with nucleosome phasing. We propose that transcript-centric mutations are one of the major driving forces for gene and genome evolution along with creation of new genes, gene/genome duplication, and horizontal gene transfer.展开更多
Cold seeps in the deep sea are closely linked to energy exploration as well as global climate change.The alkane-dominated chemical energy-driven model makes cold seeps an oasis of deep-sea life,showcasing an unparalle...Cold seeps in the deep sea are closely linked to energy exploration as well as global climate change.The alkane-dominated chemical energy-driven model makes cold seeps an oasis of deep-sea life,showcasing an unparalleled reservoir of microbial genetic diversity.Here,by analyzing 113 metagenomes collected from 14 global sites across 5 cold seep types,we present a comprehensive Cold Seep Microbiomic Database(CSMD)to archive the genomic and functional diversity of cold seep microbiomes.The CSMD includes over 49 million non-redundant genes and 3175 metagenome-assembled genomes,which represent 1895 species spanning 105 phyla.In addition,beta diversity analysis indicates that both the sampling site and cold seep type have a substantial impact on the prokaryotic microbiome community composition.Heterotrophic and anaerobic metabolisms are prevalent in microbial communities,accompanied by considerable mixotrophs and facultative anaerobes,highlighting the versatile metabolic potential in cold seeps.Furthermore,secondary metabolic gene cluster analysis indicates that at least 98.81%of the sequences potentially encode novel natural products,with ribosomally synthesized and post-translationally modified peptides being the predominant type widely distributed in archaea and bacteria.Overall,the CSMD represents a valuable resource that would enhance the understanding and utilization of global cold seep microbiomes.展开更多
Motifs,defined as short,conserved sequences with biological functionality,play a crucial role in various biological processes.Within the same protein family,different subfamilies may contain distinct combinations of m...Motifs,defined as short,conserved sequences with biological functionality,play a crucial role in various biological processes.Within the same protein family,different subfamilies may contain distinct combinations of motifs,and the composition of multiple motifs in a specific order can characterize the functionality of various proteins(Lu et al.,2020).展开更多
In plants,microRNA (miRNA) functions in the post-transcriptional repression of target mRNAs have been well explored.However,the mechanisms regulating the accumulation of miRNAs remain poorly under.stood.Here,we report...In plants,microRNA (miRNA) functions in the post-transcriptional repression of target mRNAs have been well explored.However,the mechanisms regulating the accumulation of miRNAs remain poorly under.stood.Here,we report that distinct mechanisms regulate accumulation of a monocot-specific miRNA,rice (Oryza sativa) miR528.At the transcriptional level,miR528 accumulated to higher levels in older plants than in young seedlings and exhibited aging-modulated gradual accumulation and diurnal rhythms in leaves;at the post-transcriptional level,aging also modulated miR528 levels by enhancing pri-miR528 alter.native splicing.We found that miR528 promotes rice flowering under long-day conditions by targeting RED AND FAR-RED INSENSITIVE2 (OsRFI2).Moreover,natural variations in the MIR528 promoter region caused differences in miR528 expression among rice varieties,which are correlated with their different binding affinities with the transcription factor OsSPL9 that activates the expression of miR528.Taken together,our findings reveal rice plants have evolved sophisticated modes fine-tuning miR528 levels and provide insight into the mechanisms that regulate MIRNA expression in plants.展开更多
Global concerns have been paid to the potential hazard of traditional herbal medicinal products(THMPs). Substandard and counterfeit THMPs, including traditional Chinese patent medicine, health foods, dietary supplemen...Global concerns have been paid to the potential hazard of traditional herbal medicinal products(THMPs). Substandard and counterfeit THMPs, including traditional Chinese patent medicine, health foods, dietary supplements, etc. are potential threats to public health. Recent marketplace studies using DNA barcoding have determined that the current quality control methods are not sufficient for ensuring the presence of authentic herbal ingredients and detection of contaminants/adulterants. An efficient biomonitoring method for THMPs is of great needed. Herein, metabarcoding and single-molecule, realtime(SMRT) sequencing were used to detect the multiple ingredients in Jiuwei Qianghuo Wan(JWQHW), a classical herbal prescription widely used in China for the last 800 years. Reference experimental mixtures and commercial JWQHW products from the marketplace were used to confirm the method. Successful SMRT sequencing results recovered 5416 and 4342 circular-consensus sequencing(CCS) reads belonging to the ITS2 and psb A-trn H regions. The results suggest that with the combination of metabarcoding and SMRT sequencing, it is repeatable, reliable, and sensitive enough to detect species in the THMPs, and the error in SMRT sequencing did not affect the ability to identify multiple prescribed species and several adulterants/contaminants. It has the potential for becoming a valuable tool for the biomonitoring of multi-ingredient THMPs.展开更多
With the development of genomics and bioinformatics, especially the extensive applications of high-throughput sequencing technology, more transcriptional units with little or no protein-coding potential have been disc...With the development of genomics and bioinformatics, especially the extensive applications of high-throughput sequencing technology, more transcriptional units with little or no protein-coding potential have been discovered. Such RNA molecules are called non- protein-coding RNAs (npcRNAs or ncRNAs). Among them, long npcRNAs or ncRNAs 0npcRNAs or lncRNAs) represent diverse classes of transcripts longer than 200 nucleotides. In recent years, the lncRNAs have been considered as important regulators in many essential biological processes. In plants, although a large number of lncRNA transcripts have been predicted and identified in few species, our current knowledge of their biological functions is still limited. Here, we have summarized recent studies on their identification, characteristics, classification, bioinformatics, resources, and current exploration of their biological functions in olants.展开更多
Ovary development is a complex process involving numerous genes. A well-developed ovary is essential for females to keep fertility and reproduce offspring. In order to gain a better insight into the molecular mechanis...Ovary development is a complex process involving numerous genes. A well-developed ovary is essential for females to keep fertility and reproduce offspring. In order to gain a better insight into the molecular mechanisms related to the process of mammalian ovary development, we performed a comparative transcriptomic analysis on ovaries isolated from infant and adult mice by using next-generation sequencing technology (SOLID). We identified 15,454 and 16,646 trans- criptionally active genes at the infant and adult stage, respectively. Among these genes, we also identified 7021 differentially expressed genes. Our analysis suggests that, in general, the adult ovary has a higher level of transcriptomic activity. However, it appears that genes related to primordial follicle development, such as those encoding Figla and Nobox, are more active in the infant ovary, whereas expression of genes vital for follicle development, such as Gdj~, Bmp4 and Bmpl5, is upreg- ulated in the adult. These data suggest a dynamic shift in gene expression during ovary development and it is apparent that these changes function to facilitate follicle maturation, when additional func- tional gene studies are considered. Furthermore, our investigation has also revealed several impor- tant functional pathways, such as apoptosis, MAPK and steroid biosynthesis, that appear to be much more active in the adult ovary compared to those of the infant. These findings will provide a solid foundation for future studies on ovary development in mice and other mammals and help to expand our understanding of the complex molecular and cellular events that occur during postnatal ovary development.展开更多
Genome reannotation aims for complete and accurate characterization of gene models and thus is of critical significance for in-depth exploration of gene function.Although the availability of massive RNA-seq data provi...Genome reannotation aims for complete and accurate characterization of gene models and thus is of critical significance for in-depth exploration of gene function.Although the availability of massive RNA-seq data provides great opportunities for gene model refinement,few efforts have been made to adopt these precious data in rice genome reannotation.Here we reannotate the rice(Oryza sativa L.ssp.japonica)genome based on integration of large-scale RNA-seq data and release a new annotation system IC4 R-2.0.In general,IC4 R-2.0 significantly improves the completeness of gene structure,identifies a number of novel genes,and integrates a variety of functional annotations.Furthermore,long non-coding RNAs(lncRNAs)and circular RNAs(circRNAs)are systematically characterized in the rice genome.Performance evaluation shows that compared to previous annotation systems,IC4 R-2.0 achieves higher integrity and quality,primarily attributable to massive RNA-seq data applied in genome annotation.Consequently,we incorporate the improved annotations into the Information Commons for Rice(IC4 R),a database integrating multiple omics data of rice,and accordingly update IC4 R by providing more user-friendly web interfaces and implementing a series of practical online tools.Together,the updated IC4 R,which is equipped with the improved annotations,bears great promise for comparative and functional genomic studies in rice and other monocotyledonous species.The IC4 R-2.0 annotation system and related resources are freely accessible at http://ic4 r.org/.展开更多
Metabotropic glutamate receptor 7, coupled with a chemical neurotransmitter L-glutamate, plays an important role in the development of many psychiatric and neurological disorders. To study the biological and genetic m...Metabotropic glutamate receptor 7, coupled with a chemical neurotransmitter L-glutamate, plays an important role in the development of many psychiatric and neurological disorders. To study the biological and genetic mechanism of the mGluR7-related diseases, a physical map covering the full-length mGluR7 genomic sequence has been constructed through seed clone screening and fingerprinting database searching. These BAC clones in the physical map have been sequenced with shotgun strategy and assembled by Phred-Phrap-Consed software; the error rate of the final genoniic sequence is less than 0.01%. mGluR7 spans 880 kb genoniic region, the GC content and repeat content of mGluR7 genoniic sequence are 38% and 37.5% respectively. mGluR7 has a typical 'house-keeping' promoter and consists of 11 exons, with introns ranging from 6 kb to 285 kb. mGluR7a and mGluR7b are two known alternatively splicing variants. Comparing the genomic structures of extracellular domains of mGluR family, their genomic structures can展开更多
Myeloid leukemias are highly diverse diseases and have been shown to be associated with microRNA(miRNA) expression aberrations. The present study involved an in-depth miRNome analysis of two human acute myeloid leuk...Myeloid leukemias are highly diverse diseases and have been shown to be associated with microRNA(miRNA) expression aberrations. The present study involved an in-depth miRNome analysis of two human acute myeloid leukemia(AML) cell lines, HL-60 and THP-1, and one human chronic myeloid leukemia(CML) cell line, K562, via massively parallel signature sequencing. mRNA expression profiles of these cell lines that were established previously in our lab facilitated an integrative analysis of miRNA and mRNA expression patterns. miRNA expression profiling followed by differential expression analysis and target prediction suggested numerous miRNA signatures in AML and CML cell lines. Some miRNAs may act as either tumor suppressors or oncomiRs in AML and CML by targeting key genes in AML and CML pathways. Expression patterns of cell type-specific miRNAs could partially reflect the characteristics of K562, HL-60 and THP-1 cell lines, such as actin filament-based processes, responsiveness to stimulus and phagocytic activity. miRNAs may also regulate myeloid differentiation, since they usually suppress differentiation regulators. Our study provides a resource to further investigate the employment of miRNAs in human leukemia subtyping, leukemogenesis and myeloid development. In addition, the distinctive miRNA signatures may be potential candidates for the clinical diagnosis, prognosis and treatment of myeloid leukemias.展开更多
基金supported by the research of the National Key R&D Program of China(2018YFD1000606-4)the Beijing Academy of Agriculture and Forestry Fund for Young Scholars(QNJJ201702,QNJJ201925)+1 种基金the National Natural Science Foundation of China(31401836)the Municipal Natural Science Foundation of Beijing(6162012).
文摘Apricots,scientifically known as Prunus armeniaca L,are drupes that resemble and are closely related to peaches or plums.As one of the top consumed fruits,apricots are widely grown worldwide except in Antarctica.A high-quality reference genome for apricot is still unavailable,which has become a handicap that has dramatically limited the elucidation of the associations of phenotypes with the genetic background,evolutionary diversity,and population diversity in apricot.DNA from P.armeniaca was used to generate a standard,size-selected library with an average DNA fragment size of~20 kb.The library was run on Sequel SMRT Cells,generating a total of 16.54 Gb of PacBio subreads(N50=13.55 kb).The high-quality P.armeniaca reference genome presented here was assembled using long-read single-molecule sequencing at approximately 70×coverage and 171×Illumina reads(40.46 Gb),combined with a genetic map for chromosome scaffolding.The assembled genome size was 221.9 Mb,with a contig NG50 size of 1.02 Mb.Scaffolds covering 92.88%of the assembled genome were anchored on eight chromosomes.Benchmarking Universal Single-Copy Orthologs analysis showed 98.0%complete genes.We predicted 30,436 protein-coding genes,and 38.28%of the genome was predicted to be repetitive.We found 981 contracted gene families,1324 expanded gene families and 2300 apricot-specific genes.The differentially expressed gene(DEG)analysis indicated that a change in the expression of the 9-cis-epoxycarotenoid dioxygenase(NCED)gene but not lycopene beta-cyclase(LcyB)gene results in a lowβ-carotenoid content in the white cultivar“Dabaixing”.This complete and highly contiguous P.armeniaca reference genome will be of help for future studies of resistance to plum pox virus(PPV)and the identification and characterization of important agronomic genes and breeding strategies in apricot.
文摘MicroRNAs (miRNAs) are endogenous 22-nt RNAs, which play important regulatory roles by post-transcriptional gene silencing. A computational strategy has been developed for the identification of conserved miRNAs based on features of known metazoan miRNAs in red flour beetle (Tribolium castaneum), which is regarded as one of the major laboratory models of arthropods. Among 118 putative miRNAs, 47% and 53% of the predicted miRNAs from the red flour beetle are harbored by known protein-coding genes (intronic) and genes located outside (intergenic miRNA), respectively. There are 31 intronic miRNAs in the same transcriptional orientation as the host genes, which may share RNA polymerase II and spliceosomal machinery with their host genes for their biogenesis. A hypothetical feed-back model has been proposed based on the analysis of the relationship between intronic miRNAs and their host genes in the development of red flour beetle.
文摘Maize is a globally important crop that was a classic model plant for genetic studies. Here, we report a 2.2 Gb draft genome sequence of an elite maize line, HuangZaoSi (HZS). Hybrids bred from HZS-improved lines (HILs) are planted in more than 60% of maize fields in China. Proteome clustering of six completed sequeneed maize genomes show that 638 proteins fall into 264 HZS-specific gene families with the majority of contributions from tandem duplication events. Resequencing and comparative analysis of 40 HZSrelated lines reveals the breeding history of HILs. More than 60% of identified selective sweeps were clustered in identity.by.descent conserved regions, and yield-related genes/QTLs were enriched in HZS characteristic selected regions. Furthermore, we dem on strated that HZS-specific family genes were not uniformly distributed in the genome but enriched in improvement/function.related genomic regions. This study provides an important and novel resource for maize genome research and expands our knowledge on the breadth of genomic variation and improvement history of maize.
基金supported by grants from the National Basic Research Program (973 Program 2006CB9-10401 and 2006CB910403) awarded to JY and SH
文摘Here, we evaluate the contribution of two major biological processes--DNA replication and transcription--to mutation rate variation in human genomes. Based on analysis of the public human tissue transcriptomics data, high-resolution replicating map of Hela cells and dbSNP data, we present significant correlations between expres- sion breadth, replication time in local regions and SNP density. SNP density of tissue-specific (TS) genes is sig- nificantly higher than that of housekeeping (HK) genes. TS genes tend to locate in late-replicating genomic re- gions and genes in such regions have a higher SNP density compared to those in early-replication regions. In addi- tion, SNP density is found to be positively correlated with expression level among HK genes. We conclude that the process of DNA replication generates stronger mutational pressure than transcription-associated biological processes do, resulting in an increase of mutation rate in TS genes while having weaker effects on HK genes. In contrast, transcription-associated processes are mainly responsible for the accumulation of mutations in highly-expressed HK genes.
基金supported by the National Natural Science Foundation of China (Grant No. 30972263 and 30771644)the Natural Science Foundation of Hunan Province, China (Grant No. 09JJ6037 and 08jj3064)
文摘To enrich the genomic information of the commercially important fish species, we obtained 5,063 high-quality expressed sequence tags (ESTs) from the muscle cDNA database of the mandarin fish (Siniperca chuatsi). Clustering analysis yielded 1,625 unique sequences including 443 contigs (from 3,881 EST sequences) and 1,182 sin- gletons. BLASTX searches showed that 959 unique sequences shared homology to proteins in the NCBI non-redundant database. A total of 740 unique sequences were functionally annotated using Gene Ontology. The 1,625 unique sequences were assigned to Kyoto Encyclopedia of Genes and Genomes reference pathways, and the results indicated that transcripts participating in nucleotide metabolism and amino acid metabolism are relatively abundant in S. chuatsi. Meanwhile, we identified 15 genes to be abundantly expressed in muscle of the mandarin fish. These genes are involved in muscle structural formation and regulation of muscle differentiation and development. The most remarkable gene in S. chuatsi is nuclease diphosphate kinase B, which is represented by 449 EST sequences accounting for 8.86% of the total EST sequences. Our work provides a transcript profile expressed in the white muscle of the mandarin fish, laying down a foundation in better understanding of fish genomics.
基金supported by a grant from King Abdulaziz City for Science and Technology,Riyadh,Saudi Arabia(No. KACST 428-29)institutional grant from CAS Key Laboratory of Genome Sciences and Information,Beijing Institute of Genomics, Chinese Academy of Sciences+2 种基金supported by the grants from the National Basic Research Program(973 Program)(No.2010CB126604)the Special Foundation Work Program(No.2009FY 120100)the Ministry of Science and Technology of the People's Republic of China and from the National Science Foundation of China(No. 31071163).
文摘Bacillus thuringiensis(B.thuringiensis) is a soil-dwelling Gram-positive bacterium and its plasmid-encoded toxins(Cry) are commonly used as biological alternatives to pesticides.In a pangenomic study,we sequenced seven B.thuringiensis isolates in both high coverage and base-quality using the next-generation sequencing platform.The B.thuringiensis pangenome was extrapolated to have 4196 core genes and an asymptotic value of 558 unique genes when a new genome is added.Compared to the pangenomes of its closely related species of the same genus,B.thuringiensis pangenome shows an open characteristic,similar to B.cereus but not to B.anthracis;the latter has a closed pangenome. We also found extensive divergence among the seven B.thuringiensis genome assemblies,which harbor ample repeats and single nucleotide polymorphisms(SNPs).The identities among orthologous genes are greater than 84.5%and the hotspots for the genome variations were discovered in genomic regions of 2.3-2.8 Mb and 5.0-5.6 Mb.We concluded that high-coverage sequence assemblies from multiple strains, before all the gaps are closed,are very useful for pangenomic studies.
文摘The Human Genome Project(HGP)has paved the way for the Digital Personal Genomes(DPG),whereby a person’s complete genome sequence serves as the primary entry for the Digital Healthcare Systems(DHS).If the goal is to deliver PDG on demand,affordability,mainly the cost,becomes one of the primary issues.However,as a once-in-a-lifetime event,a total cost of US$100 is close to$1 per year contribution.Therefore,what remains is still largely an engineering challenge,since a multi-fold reduction of the current per-genome sequencing cost may be achievable by increasing the scale of operation and the degree of automation over 5–10 years.The second issue is to differentiate scientific achievements from what is applicable to the healthcare systems and human well-being in general.The original thought of HGP is to understand the genetics of cancers—an idea or a proposal was made and debated in the early 1980s,and it is still clear that we need both high-quality genome sequences and time to fully understand their encoded biological information;the success of HGP lies on the separation of these two goals.
文摘An altered pattern of epigenetic modifications, such as DNA methylation and histone modification, is critical to many common human diseases, including cancer. Recently, mitochondrial DNA (mtDNA) was reported to be associated with tumorigenesis through epigenetic regulation of methylation patterns. One of the promising approaches to study DNA methylation and CpG islands (CGIs) is sequencing and analysis of clones derived from the physical library generated by methyl-CpG-binding domain proteins and restriction enzyme MseI. In this study, we observed that the most redundant sequences of 349 clones in a human CGI library were all generated from the human mitochondrial genome. Further analysis indicated that there was a 5,845-bp DNA transfer from mtDNA to chromosome 1, and all the clones should be the products of a 510-bp MseI fragment, which contained a putative CGI of 270 bp. The 510-bp fragment was annotated as part of cytochrome c oxidase subunit II (COXII), and phylogenetic analysis of homologous sequences containing COXII showed three DNA transfer events from mtDNA to nuclear genome, one of which underwent secondary transfer events between different chromosomes. These results may further our understanding of how the mtDNA regulates DNA methylation in the nucleus.
文摘In order to identify the genes associated with glioblastoma differentiation, some ESTs, expressed differentially in the control cell and the differentiated human glioblastoma cell line BT-325 induced by the all-trans retinoid acid, have been isolated by the method of DDRT-PCR. Of the 46 ESTs sequenced, 19 are from new genes. A full-length 1 535-bp cDNA, termed gene GDR1, has been isolated from the human cDNA library using the probe designed according to one of the novel ESTs, HGBB098. The open reading frame of GDR1 gene encodes a putative protein containing 334 amino acid residues. Blast against the current GenBank DMA and protein sequence database did not reveal significant homology with any known proteins. RT-PCR shows that GDR1 mRNA level increased in the differentiated BT-325 cells after being treated with RA. The different expression patterns of GDR1 mRNA in human tissues have been detected through the multiple tissue Northern blot hybridization.
基金supported by grants from the National Basic Research Program (973 Program 2011CB944100 and 2011CB944101)+1 种基金National Natural Science Foundation of China (90919024) awarded to JYKnowledge Innovation Program of the Chinese Academy of Sciences (KSCX2-EW-R-01-04) to SH
文摘Since the human genome is mostly transcribed, genetic variations must exhibit sequence signatures reflecting the relationship between transcription processes and chromosomal structures as we have observed in unicellular or- ganisms. In this study, a set of 646 ubiquitous expression-invariable genes (EIGs) which are present in germline cells were defined and examined based on RNA-sequencing data from multiple high-throughput transcriptomic data. We demonstrated a relationship between gene expression level and transcript-centric mutations in the human genome based on single nucleotide polymorphism (SNP) data. A significant positive correlation was shown be- tween gene expression and mutation, where highly-expressed genes accumulate more mutations than low- ly-expressed genes. Furthermore, we found four major types of transcript-centric mutations: C---~T, A---~G; C---~ and G--~T in human genomes and identified a negative gradient of the sequence variations aligning from the 5' end to the 3' end of the transcription units (TUs). The periodical occurrence of these genetic variations across TUs is associated with nucleosome phasing. We propose that transcript-centric mutations are one of the major driving forces for gene and genome evolution along with creation of new genes, gene/genome duplication, and horizontal gene transfer.
基金support from the Senior User Project of RV KEXUE(Grant No.KEXUE2019GZ05)the Center for Ocean Mega-Science,Chinese Academy of Sciences+2 种基金funding support from the Second Tibetan Plateau Scientific Expedition and Research Program(Grant No.2021QZKK0100)the National Key R&D Program of China(Grant No.2022YFF1002801)the National Natural Science Foundation of China(Grant No.92251302).
文摘Cold seeps in the deep sea are closely linked to energy exploration as well as global climate change.The alkane-dominated chemical energy-driven model makes cold seeps an oasis of deep-sea life,showcasing an unparalleled reservoir of microbial genetic diversity.Here,by analyzing 113 metagenomes collected from 14 global sites across 5 cold seep types,we present a comprehensive Cold Seep Microbiomic Database(CSMD)to archive the genomic and functional diversity of cold seep microbiomes.The CSMD includes over 49 million non-redundant genes and 3175 metagenome-assembled genomes,which represent 1895 species spanning 105 phyla.In addition,beta diversity analysis indicates that both the sampling site and cold seep type have a substantial impact on the prokaryotic microbiome community composition.Heterotrophic and anaerobic metabolisms are prevalent in microbial communities,accompanied by considerable mixotrophs and facultative anaerobes,highlighting the versatile metabolic potential in cold seeps.Furthermore,secondary metabolic gene cluster analysis indicates that at least 98.81%of the sequences potentially encode novel natural products,with ribosomally synthesized and post-translationally modified peptides being the predominant type widely distributed in archaea and bacteria.Overall,the CSMD represents a valuable resource that would enhance the understanding and utilization of global cold seep microbiomes.
基金supported by the National Key Research and Development Program of China(2021YFA0909500)。
文摘Motifs,defined as short,conserved sequences with biological functionality,play a crucial role in various biological processes.Within the same protein family,different subfamilies may contain distinct combinations of motifs,and the composition of multiple motifs in a specific order can characterize the functionality of various proteins(Lu et al.,2020).
基金supported by the National Natural Science Foundation of China (grants 91540203 and 31788103 to X.C.,31771872 to X.S.)The National Key Research and Development Program of China (2016YFD0100904)+3 种基金the Genetically Modified Breeding Major Projects (grant no.2016ZX08009001 -005 to X.S.)the Key Research Program of Frontier Sciences Chinese Academy of Sciences (QYZDY-SSWSMC022 to X.C.)Strategic Priority Research Program of Chinese Academy of Sciences (XDB27030201 to X.C.)the State Key Laboratory of Plant Genomics.
文摘In plants,microRNA (miRNA) functions in the post-transcriptional repression of target mRNAs have been well explored.However,the mechanisms regulating the accumulation of miRNAs remain poorly under.stood.Here,we report that distinct mechanisms regulate accumulation of a monocot-specific miRNA,rice (Oryza sativa) miR528.At the transcriptional level,miR528 accumulated to higher levels in older plants than in young seedlings and exhibited aging-modulated gradual accumulation and diurnal rhythms in leaves;at the post-transcriptional level,aging also modulated miR528 levels by enhancing pri-miR528 alter.native splicing.We found that miR528 promotes rice flowering under long-day conditions by targeting RED AND FAR-RED INSENSITIVE2 (OsRFI2).Moreover,natural variations in the MIR528 promoter region caused differences in miR528 expression among rice varieties,which are correlated with their different binding affinities with the transcription factor OsSPL9 that activates the expression of miR528.Taken together,our findings reveal rice plants have evolved sophisticated modes fine-tuning miR528 levels and provide insight into the mechanisms that regulate MIRNA expression in plants.
基金supported by the National Natural Science Foundation of China (Grant No. 81373922)Chinese Academy of Medical Sciences (CAMS) Innovation Fund for Medical Sciences (Grant No. CIFMS, 2016-I2M-3–016)
文摘Global concerns have been paid to the potential hazard of traditional herbal medicinal products(THMPs). Substandard and counterfeit THMPs, including traditional Chinese patent medicine, health foods, dietary supplements, etc. are potential threats to public health. Recent marketplace studies using DNA barcoding have determined that the current quality control methods are not sufficient for ensuring the presence of authentic herbal ingredients and detection of contaminants/adulterants. An efficient biomonitoring method for THMPs is of great needed. Herein, metabarcoding and single-molecule, realtime(SMRT) sequencing were used to detect the multiple ingredients in Jiuwei Qianghuo Wan(JWQHW), a classical herbal prescription widely used in China for the last 800 years. Reference experimental mixtures and commercial JWQHW products from the marketplace were used to confirm the method. Successful SMRT sequencing results recovered 5416 and 4342 circular-consensus sequencing(CCS) reads belonging to the ITS2 and psb A-trn H regions. The results suggest that with the combination of metabarcoding and SMRT sequencing, it is repeatable, reliable, and sensitive enough to detect species in the THMPs, and the error in SMRT sequencing did not affect the ability to identify multiple prescribed species and several adulterants/contaminants. It has the potential for becoming a valuable tool for the biomonitoring of multi-ingredient THMPs.
基金supported by the China Postdoctoral Science Foundation (Grant No.2013M530694 to XL)the National Natural Science Foundation of China (Grant Nos.31271385 to SH,31100915 to LH,and 31123007 to LZ)supported by the State Key Laboratory of Plant Genomics of China (Grant No.2015B0129-03)
文摘With the development of genomics and bioinformatics, especially the extensive applications of high-throughput sequencing technology, more transcriptional units with little or no protein-coding potential have been discovered. Such RNA molecules are called non- protein-coding RNAs (npcRNAs or ncRNAs). Among them, long npcRNAs or ncRNAs 0npcRNAs or lncRNAs) represent diverse classes of transcripts longer than 200 nucleotides. In recent years, the lncRNAs have been considered as important regulators in many essential biological processes. In plants, although a large number of lncRNA transcripts have been predicted and identified in few species, our current knowledge of their biological functions is still limited. Here, we have summarized recent studies on their identification, characteristics, classification, bioinformatics, resources, and current exploration of their biological functions in olants.
基金supported by the National Natural Science Foundation of China (Grant No. 31271385)the Knowledge Innovation Program of Chinese Academy of Sciences of China (Grant No. KSCX2-EW-R-01-04)the National HighTech R&D Program (863 Program, Grant No. 2009AA01A 130) from the Ministry of Science and Technology of China
文摘Ovary development is a complex process involving numerous genes. A well-developed ovary is essential for females to keep fertility and reproduce offspring. In order to gain a better insight into the molecular mechanisms related to the process of mammalian ovary development, we performed a comparative transcriptomic analysis on ovaries isolated from infant and adult mice by using next-generation sequencing technology (SOLID). We identified 15,454 and 16,646 trans- criptionally active genes at the infant and adult stage, respectively. Among these genes, we also identified 7021 differentially expressed genes. Our analysis suggests that, in general, the adult ovary has a higher level of transcriptomic activity. However, it appears that genes related to primordial follicle development, such as those encoding Figla and Nobox, are more active in the infant ovary, whereas expression of genes vital for follicle development, such as Gdj~, Bmp4 and Bmpl5, is upreg- ulated in the adult. These data suggest a dynamic shift in gene expression during ovary development and it is apparent that these changes function to facilitate follicle maturation, when additional func- tional gene studies are considered. Furthermore, our investigation has also revealed several impor- tant functional pathways, such as apoptosis, MAPK and steroid biosynthesis, that appear to be much more active in the adult ovary compared to those of the infant. These findings will provide a solid foundation for future studies on ovary development in mice and other mammals and help to expand our understanding of the complex molecular and cellular events that occur during postnatal ovary development.
基金supported by grants from the Strategic Priority Research Program of Chinese Academy of Sciences(Grant No.XDA08020102 to ZZ and SH)the Youth Innovation Promotion Association of Chinese Academy of Science(Grant No.2018134 to LH)+2 种基金National Programs for High TechnologyResearch and Development(Grant Nos.2015AA020108 and 2012AA020409 to ZZ)the 100-Talent Program of Chinese Academy of Sciences(to YB and ZZ)the National Natural Science Foundation of China(Grant No.31100915 to LH)
文摘Genome reannotation aims for complete and accurate characterization of gene models and thus is of critical significance for in-depth exploration of gene function.Although the availability of massive RNA-seq data provides great opportunities for gene model refinement,few efforts have been made to adopt these precious data in rice genome reannotation.Here we reannotate the rice(Oryza sativa L.ssp.japonica)genome based on integration of large-scale RNA-seq data and release a new annotation system IC4 R-2.0.In general,IC4 R-2.0 significantly improves the completeness of gene structure,identifies a number of novel genes,and integrates a variety of functional annotations.Furthermore,long non-coding RNAs(lncRNAs)and circular RNAs(circRNAs)are systematically characterized in the rice genome.Performance evaluation shows that compared to previous annotation systems,IC4 R-2.0 achieves higher integrity and quality,primarily attributable to massive RNA-seq data applied in genome annotation.Consequently,we incorporate the improved annotations into the Information Commons for Rice(IC4 R),a database integrating multiple omics data of rice,and accordingly update IC4 R by providing more user-friendly web interfaces and implementing a series of practical online tools.Together,the updated IC4 R,which is equipped with the improved annotations,bears great promise for comparative and functional genomic studies in rice and other monocotyledonous species.The IC4 R-2.0 annotation system and related resources are freely accessible at http://ic4 r.org/.
基金This work was supported by the "863" Program of the MOSC (Grant No. 863-J19) the Creative Program of the Chinese Academy of Sciences (Grant No. KSCX1-D4).
文摘Metabotropic glutamate receptor 7, coupled with a chemical neurotransmitter L-glutamate, plays an important role in the development of many psychiatric and neurological disorders. To study the biological and genetic mechanism of the mGluR7-related diseases, a physical map covering the full-length mGluR7 genomic sequence has been constructed through seed clone screening and fingerprinting database searching. These BAC clones in the physical map have been sequenced with shotgun strategy and assembled by Phred-Phrap-Consed software; the error rate of the final genoniic sequence is less than 0.01%. mGluR7 spans 880 kb genoniic region, the GC content and repeat content of mGluR7 genoniic sequence are 38% and 37.5% respectively. mGluR7 has a typical 'house-keeping' promoter and consists of 11 exons, with introns ranging from 6 kb to 285 kb. mGluR7a and mGluR7b are two known alternatively splicing variants. Comparing the genomic structures of extracellular domains of mGluR family, their genomic structures can
基金supported by the ‘‘Strategic Priority Research Program’’ of the Chinese Academy of Sciences,Stem Cell and Regenerative Medicine Research(Grant No.XDA01040405)National Programs for High Technology Research and Development(863 Projects,Grant No.2012AA022502)National Key Scientific Instrument and Equipment Development Projects of China(Grant No.2011YQ03013404)awarded to XF
文摘Myeloid leukemias are highly diverse diseases and have been shown to be associated with microRNA(miRNA) expression aberrations. The present study involved an in-depth miRNome analysis of two human acute myeloid leukemia(AML) cell lines, HL-60 and THP-1, and one human chronic myeloid leukemia(CML) cell line, K562, via massively parallel signature sequencing. mRNA expression profiles of these cell lines that were established previously in our lab facilitated an integrative analysis of miRNA and mRNA expression patterns. miRNA expression profiling followed by differential expression analysis and target prediction suggested numerous miRNA signatures in AML and CML cell lines. Some miRNAs may act as either tumor suppressors or oncomiRs in AML and CML by targeting key genes in AML and CML pathways. Expression patterns of cell type-specific miRNAs could partially reflect the characteristics of K562, HL-60 and THP-1 cell lines, such as actin filament-based processes, responsiveness to stimulus and phagocytic activity. miRNAs may also regulate myeloid differentiation, since they usually suppress differentiation regulators. Our study provides a resource to further investigate the employment of miRNAs in human leukemia subtyping, leukemogenesis and myeloid development. In addition, the distinctive miRNA signatures may be potential candidates for the clinical diagnosis, prognosis and treatment of myeloid leukemias.