Watermelon,Citrullus lanatus,is the world's third largest fruit crop.Reference genomes with gaps and a narrow genetic base hinder functional genomics and genetic improvement of watermelon.Here,we report the assemb...Watermelon,Citrullus lanatus,is the world's third largest fruit crop.Reference genomes with gaps and a narrow genetic base hinder functional genomics and genetic improvement of watermelon.Here,we report the assembly of a telomere-to-telomere gap-free genome of the elite watermelon inbred line G42 by incorporating high-coverage and accurate long-read sequencing data with multiple assembly strategies.All 11 chromosomes have been assembled into single-contig pseudomolecules without gaps,representing the highest completeness and assembly quality to date.The G42 reference genome is 369321829 bp in length and contains 24205 predicted protein-coding genes,with all 22 telomeres and 11 centromeres characterized.Furthermore,we established a pollen-EMS mutagenesis protocol and obtained over 200000M1 seeds from G42.In a sampling pool,48 monogenic phenotypic mutations,selected from 223M1and 78 M2 mutants with morphological changes,were confirmed.The average mutation density was 1 SNP/1.69Mband1 indel/4.55 Mb per M1 plant and 1SNP/1.08Mb and 1 indel/6.25 Mb per M2 plant.Taking advantage of the gap-free G42 genome,8039 mutations from 32 plants sampled from M1 and M2 families were identified with 100%accuracy,whereas only 25% of the randomly selected mutations identified using the 97103v2 reference genome could be confirmed.Using this library and the gap-free genome,two genes responsible for elongated fruit shape and male sterility(CiMs1)were identified,both caused by a single basechange from G to A.The validated gap-free genome and its EMS mutation library provide invaluable resources for functional genomics and genetic improvement of watermelon.展开更多
Transcripts are expressed spatially and temporally and they are very complicated, precise and specific; however, most studies are focused on protein-coding related genes. Recently, massively parallel c DNA sequencing(...Transcripts are expressed spatially and temporally and they are very complicated, precise and specific; however, most studies are focused on protein-coding related genes. Recently, massively parallel c DNA sequencing(RNA-seq) has emerged to be a new and promising tool for transcriptome research, and numbers of non-coding RNAs, especially linc RNAs, have been widely identified and well characterized as important regulators of diverse biological processes. In this study, we used ultra-deep RNA-seq data from 15 mouse tissues to study the diversity and dynamic of non-coding RNAs in mouse. Using our own criteria, we identified totally 16,249 non-coding genes(21,569 non-coding RNAs) in mouse. We annotated these non-coding RNAs by diverse properties and found non-coding RNAs are generally shorter, have fewer exons, express in lower level and are more strikingly tissue-specific compared with protein-coding genes. Moreover, these non-coding RNAs show significant enrichment with transcriptional initiation and elongation signals including histone modifications(H3K4me3, H3K27me3 and H3K36me3), RNAPII binding sites and CAGE tags. The gene set enrichment analysis(GSEA) result revealed several sets of linc RNAs associated with diverse biological processes such as immune effector process, muscle development and sexual reproduction. Taken together, this study provides a more comprehensive annotation of mouse non-coding RNAs and gives an opportunity for future functional and evolutionary study of mouse non-coding RNAs.展开更多
基金This work was supported by the Provincial Technology Innovation Program of Shandong,Ningxia Hui Autonomous Region agricultural breeding special project(NXNYYZ202001)Jiangsu Seed Industry Revitalization Competitive Project JBGS(2021)072,Ningbo Science and Technology Innovation Project 2021Z132,and Weifang Seed InnovationGroup.
文摘Watermelon,Citrullus lanatus,is the world's third largest fruit crop.Reference genomes with gaps and a narrow genetic base hinder functional genomics and genetic improvement of watermelon.Here,we report the assembly of a telomere-to-telomere gap-free genome of the elite watermelon inbred line G42 by incorporating high-coverage and accurate long-read sequencing data with multiple assembly strategies.All 11 chromosomes have been assembled into single-contig pseudomolecules without gaps,representing the highest completeness and assembly quality to date.The G42 reference genome is 369321829 bp in length and contains 24205 predicted protein-coding genes,with all 22 telomeres and 11 centromeres characterized.Furthermore,we established a pollen-EMS mutagenesis protocol and obtained over 200000M1 seeds from G42.In a sampling pool,48 monogenic phenotypic mutations,selected from 223M1and 78 M2 mutants with morphological changes,were confirmed.The average mutation density was 1 SNP/1.69Mband1 indel/4.55 Mb per M1 plant and 1SNP/1.08Mb and 1 indel/6.25 Mb per M2 plant.Taking advantage of the gap-free G42 genome,8039 mutations from 32 plants sampled from M1 and M2 families were identified with 100%accuracy,whereas only 25% of the randomly selected mutations identified using the 97103v2 reference genome could be confirmed.Using this library and the gap-free genome,two genes responsible for elongated fruit shape and male sterility(CiMs1)were identified,both caused by a single basechange from G to A.The validated gap-free genome and its EMS mutation library provide invaluable resources for functional genomics and genetic improvement of watermelon.
基金supported by grants from Natural Science Foundation of China (31271385)Knowledge Innovation Program of the Chinese Academy of Sciences (KSCX2-EW-R-01-04)
文摘Transcripts are expressed spatially and temporally and they are very complicated, precise and specific; however, most studies are focused on protein-coding related genes. Recently, massively parallel c DNA sequencing(RNA-seq) has emerged to be a new and promising tool for transcriptome research, and numbers of non-coding RNAs, especially linc RNAs, have been widely identified and well characterized as important regulators of diverse biological processes. In this study, we used ultra-deep RNA-seq data from 15 mouse tissues to study the diversity and dynamic of non-coding RNAs in mouse. Using our own criteria, we identified totally 16,249 non-coding genes(21,569 non-coding RNAs) in mouse. We annotated these non-coding RNAs by diverse properties and found non-coding RNAs are generally shorter, have fewer exons, express in lower level and are more strikingly tissue-specific compared with protein-coding genes. Moreover, these non-coding RNAs show significant enrichment with transcriptional initiation and elongation signals including histone modifications(H3K4me3, H3K27me3 and H3K36me3), RNAPII binding sites and CAGE tags. The gene set enrichment analysis(GSEA) result revealed several sets of linc RNAs associated with diverse biological processes such as immune effector process, muscle development and sexual reproduction. Taken together, this study provides a more comprehensive annotation of mouse non-coding RNAs and gives an opportunity for future functional and evolutionary study of mouse non-coding RNAs.