The rapid advancement of sequencing technologies poses challenges in managing the large volume and exponential growth of sequence data efficiently and on time.To address this issue,we present GenBase(https://ngdc.cncb...The rapid advancement of sequencing technologies poses challenges in managing the large volume and exponential growth of sequence data efficiently and on time.To address this issue,we present GenBase(https://ngdc.cncb.ac.cn/genbase),an open-access data repository that follows the International Nucleotide Sequence Database Collaboration(INSDC)data standards and structures,for efficient nucleotide sequence archiving,searching,and sharing.As a core resource within the National Genomics Data Center(NGDC)of the China National Center for Bioinformation(CNCB;https://ngdc.cncb.ac.cn),GenBase offers bilingual submission pipeline and services,as well as local submission assistance in China.GenBase also provides a unique Excel format for metadata description and feature annotation of nucleotide sequences,along with a real-time data validation system to streamline sequence submissions.As of April 23,2024,GenBase received 68,251 nucleotide sequences and 689,574 annotated protein sequences across 414 species from 2319 submissions.Out of these,63,614(93%)nucleotide sequences and 620,640(90%)annotated protein sequences have been released and are publicly accessible through GenBase’s web search system,File Transfer Protocol(FTP),and Application Programming Interface(API).Additionally,in collaboration with INSDC,GenBase has constructed an effective data exchange mechanism with GenBank and started sharing released nucleotide sequences.Furthermore,GenBase integrates all sequences from GenBank with daily updates,demonstrating its commitment to actively contributing to global sequence data management and sharing.展开更多
The Genome Warehouse(GWH)is a public repository housing genome assembly data for a wide range of species and delivering a series of web services for genome data submission,storage,release,and sharing.As one of the cor...The Genome Warehouse(GWH)is a public repository housing genome assembly data for a wide range of species and delivering a series of web services for genome data submission,storage,release,and sharing.As one of the core resources in the National Genomics Data Center(NGDC),part of the China National Center for Bioinformation(CNCB;https://ngdc.cncb.ac.cn),GWH accepts both full and partial(chloroplast,mitochondrion,and plasmid)genome sequences with different assembly levels,as well as an update of existing genome assemblies.For each assembly,GWH collects detailed genome-related metadata of biological project,biological sample,and genome assembly,in addition to genome sequence and annotation.To archive high-quality genome sequences and annotations,GWH is equipped with a uniform and standardized procedure for quality control.Besides basic browse and search functionalities,all released genome sequences and annotations can be visualized with JBrowse.By May 21,2021,GWH has received 19,124 direct submissions covering a diversity of 1108 species and has released 8772 of them.Collectively,GWH serves as an important resource for genomescale data management and provides free and publicly accessible data to support research activities throughout the world.GWH is publicly accessible at https://ngdc.cncb.ac.cn/gwh.展开更多
Postzygotic mutations are acquired in normal tissues throughout an individual’s lifetime and hold clues for identifying mutagenic factors.Here,we investigated postzygotic mutation spectra of healthy individuals using...Postzygotic mutations are acquired in normal tissues throughout an individual’s lifetime and hold clues for identifying mutagenic factors.Here,we investigated postzygotic mutation spectra of healthy individuals using optimized ultra-deep exome sequencing of the time-series samples from the same volunteer as well as the samples from different individuals.In blood,sperm,and muscle cells,we resolved three common types of mutational signatures.Signatures A and B represent clocklike mutational processes,and the polymorphisms of epigenetic regulation genes influence the proportion of signature B in mutation profiles.Notably,signature C,characterized by C>T transitions at GpCpN sites,tends to be a feature of diverse normal tissues.Mutations of this type are likely to occur early during embryonic development,supported by their relatively high allelic frequencies,presence in multiple tissues,and decrease in occurrence with age.Almost none of the public datasets for tumors feature this signature,except for 19.6%of samples of clear cell renal cell carcinoma with increased activation of the hypoxia-inducible factor 1(HIF-1)signaling pathway.Moreover,the accumulation of signature C in the mutation profile was accelerated in a human embryonic stem cell line with drug-induced activation of HIF-1α.Thus,embryonic hypoxia may explain this novel signature across multiple normal tissues.Our study suggests that hypoxic condition in an early stage of embryonic development is a crucial factor inducing C>T transitions at GpCpN sites;and individuals’genetic background may also influence their postzygotic mutation profiles.展开更多
基金supported by the Strategic Priority Research Program of the Chinese Academy of Sciences(Grant No.XDB38030200)the National Key R&D Program of China(Grant No.2021YFF0703701)+2 种基金the Professional Association of the Alliance of International Science Organizations(Grant No.ANSO-PA-2023-07)the International Partnership Program of the Chinese Academy of Sciences(Grant No.161GJHZ2022002MI)the Open Biodiversity and Health Big Data Initiative of International Union of Biological Sciences(IUBS).
文摘The rapid advancement of sequencing technologies poses challenges in managing the large volume and exponential growth of sequence data efficiently and on time.To address this issue,we present GenBase(https://ngdc.cncb.ac.cn/genbase),an open-access data repository that follows the International Nucleotide Sequence Database Collaboration(INSDC)data standards and structures,for efficient nucleotide sequence archiving,searching,and sharing.As a core resource within the National Genomics Data Center(NGDC)of the China National Center for Bioinformation(CNCB;https://ngdc.cncb.ac.cn),GenBase offers bilingual submission pipeline and services,as well as local submission assistance in China.GenBase also provides a unique Excel format for metadata description and feature annotation of nucleotide sequences,along with a real-time data validation system to streamline sequence submissions.As of April 23,2024,GenBase received 68,251 nucleotide sequences and 689,574 annotated protein sequences across 414 species from 2319 submissions.Out of these,63,614(93%)nucleotide sequences and 620,640(90%)annotated protein sequences have been released and are publicly accessible through GenBase’s web search system,File Transfer Protocol(FTP),and Application Programming Interface(API).Additionally,in collaboration with INSDC,GenBase has constructed an effective data exchange mechanism with GenBank and started sharing released nucleotide sequences.Furthermore,GenBase integrates all sequences from GenBank with daily updates,demonstrating its commitment to actively contributing to global sequence data management and sharing.
基金supported by the Strategic Priority Research Program of Chinese Academy of Sciences(Grant Nos.XDB38060100 and XDB38030200 to YBXDB38050300 to WZ+9 种基金XDB38030400 to JXXDA19050302 to ZZ)the National Key R&D Program of China(Grant Nos.2016YFE0206600 to YB2020YFC0847000,2018YFD1000505,2017YFC1201202,and 2016YFC0901603 to WZ2017YFC0907502 to ZZ)the 13th Five-year Informatization Plan of Chinese Academy of Sciences(Grant No.XXH13505-05 to YB)the Genomics Data Center Construction of Chinese Academy of Sciences(Grant No.XXH-13514-0202 to YB)the Open Biodiversity and Health Big Data Programme of International Union of Biological Sciences to YB,the Professional Association of the Alliance of International Science Organizations(Grant No.ANSO-PA-2020-07 to YB)the National Natural Science Foundation of China(Grant Nos.32030021 and 31871328 to ZZ)the International Partnership Program of the Chinese Academy of Sciences(Grant No.153F11KYSB20160008 to ZZ)。
文摘The Genome Warehouse(GWH)is a public repository housing genome assembly data for a wide range of species and delivering a series of web services for genome data submission,storage,release,and sharing.As one of the core resources in the National Genomics Data Center(NGDC),part of the China National Center for Bioinformation(CNCB;https://ngdc.cncb.ac.cn),GWH accepts both full and partial(chloroplast,mitochondrion,and plasmid)genome sequences with different assembly levels,as well as an update of existing genome assemblies.For each assembly,GWH collects detailed genome-related metadata of biological project,biological sample,and genome assembly,in addition to genome sequence and annotation.To archive high-quality genome sequences and annotations,GWH is equipped with a uniform and standardized procedure for quality control.Besides basic browse and search functionalities,all released genome sequences and annotations can be visualized with JBrowse.By May 21,2021,GWH has received 19,124 direct submissions covering a diversity of 1108 species and has released 8772 of them.Collectively,GWH serves as an important resource for genomescale data management and provides free and publicly accessible data to support research activities throughout the world.GWH is publicly accessible at https://ngdc.cncb.ac.cn/gwh.
基金supported by the grants from the Strategic Priority Research Program of Chinese Academy of Sciences(Grant No.XDB13020500)the National Natural Science Foundation of China(NSFC)(Grant Nos.91131905,31471199,and 91631304)+3 种基金the Key Research Program of Chinese Academy of Sciences(Grant No.KJZD-EW-L14 to CZ)the NSFC(Grant Nos.31440057 and 31701081 to WC)the 111 Project(Grant No.B13003 to WC and DZ)the Innovation Promotion Association of Chinese Academy of Sciences(Grant Nos.2016098 to DZ and 2019103 to AC)。
文摘Postzygotic mutations are acquired in normal tissues throughout an individual’s lifetime and hold clues for identifying mutagenic factors.Here,we investigated postzygotic mutation spectra of healthy individuals using optimized ultra-deep exome sequencing of the time-series samples from the same volunteer as well as the samples from different individuals.In blood,sperm,and muscle cells,we resolved three common types of mutational signatures.Signatures A and B represent clocklike mutational processes,and the polymorphisms of epigenetic regulation genes influence the proportion of signature B in mutation profiles.Notably,signature C,characterized by C>T transitions at GpCpN sites,tends to be a feature of diverse normal tissues.Mutations of this type are likely to occur early during embryonic development,supported by their relatively high allelic frequencies,presence in multiple tissues,and decrease in occurrence with age.Almost none of the public datasets for tumors feature this signature,except for 19.6%of samples of clear cell renal cell carcinoma with increased activation of the hypoxia-inducible factor 1(HIF-1)signaling pathway.Moreover,the accumulation of signature C in the mutation profile was accelerated in a human embryonic stem cell line with drug-induced activation of HIF-1α.Thus,embryonic hypoxia may explain this novel signature across multiple normal tissues.Our study suggests that hypoxic condition in an early stage of embryonic development is a crucial factor inducing C>T transitions at GpCpN sites;and individuals’genetic background may also influence their postzygotic mutation profiles.