Predicting the material stability is essential for accelerating the discovery of advanced materials in renewable energy, aerospace, and catalysis. Traditional approaches, such as Density Functional Theory (DFT), are a...Predicting the material stability is essential for accelerating the discovery of advanced materials in renewable energy, aerospace, and catalysis. Traditional approaches, such as Density Functional Theory (DFT), are accurate but computationally expensive and unsuitable for high-throughput screening. This study introduces a machine learning (ML) framework trained on high-dimensional data from the Open Quantum Materials Database (OQMD) to predict formation energy, a key stability metric. Among the evaluated models, deep learning outperformed Gradient Boosting Machines and Random Forest, achieving up to 0.88 R2 prediction accuracy. Feature importance analysis identified thermodynamic, electronic, and structural properties as the primary drivers of stability, offering interpretable insights into material behavior. Compared to DFT, the proposed ML framework significantly reduces computational costs, enabling the rapid screening of thousands of compounds. These results highlight ML’s transformative potential in materials discovery, with direct applications in energy storage, semiconductors, and catalysis.展开更多
Gene sequencing is a great way to interpret life, and high-throughput sequencing technology is a revolutionary technological innovation in gene sequencing researches. This technology is characterized by low cost and h...Gene sequencing is a great way to interpret life, and high-throughput sequencing technology is a revolutionary technological innovation in gene sequencing researches. This technology is characterized by low cost and high-throughput data. Currently, high-throughput sequencing technology has been widely applied in multi-level researches on genomics, transcriptomics and epigenomics. And it has fundamentally changed the way we approach problems in basic and translational researches and created many new possibilities. This paper presented a general description of high-throughput sequencing technology and a comprehensive review of its application with plain, concisely and precisely. In order to help researchers finish their work faster and better, promote science amateurs and understand it easier and better.展开更多
BACKGROUND The broader use of high-throughput technologies has led to improved molecular characterization of hepatocellular carcinoma(HCC).AIM To comprehensively analyze and characterize all publicly available genomic...BACKGROUND The broader use of high-throughput technologies has led to improved molecular characterization of hepatocellular carcinoma(HCC).AIM To comprehensively analyze and characterize all publicly available genomic,gene expression,methylation,miRNA and proteomic data in HCC,covering 85 studies and 3355 patient sample profiles,to identify the key dysregulated genes and pathways they affect.METHODS We collected and curated all well-annotated and publicly available highthroughput datasets from PubMed and Gene Expression Omnibus derived from human HCC tissue.Comprehensive pathway enrichment analysis was performed using pathDIP for each data type(genomic,gene expression,methylation,miRNA and proteomic),and the overlap of pathways was assessed to elucidate pathway dependencies in HCC.RESULTS We identified a total of 8733 abstracts retrieved by the search on PubMed on HCC for the different layers of data on human HCC samples,published until December 2016.The common key dysregulated pathways in HCC tissue across different layers of data included epidermal growth factor(EGFR)andβ1-integrin pathways.Genes along these pathways were significantly and consistently dysregulated across the different types of high-throughput data and had prognostic value with respect to overall survival.Using CTD database,estradiol would best modulate and revert these genes appropriately.CONCLUSION By analyzing and integrating all available high-throughput genomic,transcriptomic,miRNA,methylation and proteomic data from human HCC tissue,we identified EGFR,β1-integrin and axon guidance as pathway dependencies in HCC.These are master regulators of key pathways in HCC,such as the mTOR,Ras/Raf/MAPK and p53 pathways.The genes implicated in these pathways had prognostic value in HCC,with Netrin and Slit3 being novel proteins of prognostic importance to HCC.Based on this integrative analysis,EGFR,andβ1-integrin are master regulators that could serve as potential therapeutic targets in HCC.展开更多
RNA-sequencing(RNA-seq),based on next-generation sequencing technologies,has rapidly become a standard and popular technology for transcriptome analysis.However,serious challenges still exist in analyzing and interpre...RNA-sequencing(RNA-seq),based on next-generation sequencing technologies,has rapidly become a standard and popular technology for transcriptome analysis.However,serious challenges still exist in analyzing and interpreting the RNA-seq data.With the development of high-throughput sequencing technology,the sequencing depth of RNA-seq data increases explosively.The intricate biological process of transcriptome is more complicated and diversified beyond our imagination.Moreover,most of the remaining organisms still have no available reference genome or have only incomplete genome annotations.Therefore,a large number of bioinformatics methods for various transcriptomics studies are proposed to effectively settle these challenges.This review comprehensively summarizes the various studies in RNA-seq data analysis and their corresponding analysis methods,including genome annotation,quality control and pre-processing of reads,read alignment,transcriptome assembly,gene and isoform expression quantification,differential expression analysis,data visualization and other analyses.展开更多
文摘Predicting the material stability is essential for accelerating the discovery of advanced materials in renewable energy, aerospace, and catalysis. Traditional approaches, such as Density Functional Theory (DFT), are accurate but computationally expensive and unsuitable for high-throughput screening. This study introduces a machine learning (ML) framework trained on high-dimensional data from the Open Quantum Materials Database (OQMD) to predict formation energy, a key stability metric. Among the evaluated models, deep learning outperformed Gradient Boosting Machines and Random Forest, achieving up to 0.88 R2 prediction accuracy. Feature importance analysis identified thermodynamic, electronic, and structural properties as the primary drivers of stability, offering interpretable insights into material behavior. Compared to DFT, the proposed ML framework significantly reduces computational costs, enabling the rapid screening of thousands of compounds. These results highlight ML’s transformative potential in materials discovery, with direct applications in energy storage, semiconductors, and catalysis.
基金Supported by the National Natural Science Foundations of China(3127218631301791)
文摘Gene sequencing is a great way to interpret life, and high-throughput sequencing technology is a revolutionary technological innovation in gene sequencing researches. This technology is characterized by low cost and high-throughput data. Currently, high-throughput sequencing technology has been widely applied in multi-level researches on genomics, transcriptomics and epigenomics. And it has fundamentally changed the way we approach problems in basic and translational researches and created many new possibilities. This paper presented a general description of high-throughput sequencing technology and a comprehensive review of its application with plain, concisely and precisely. In order to help researchers finish their work faster and better, promote science amateurs and understand it easier and better.
文摘BACKGROUND The broader use of high-throughput technologies has led to improved molecular characterization of hepatocellular carcinoma(HCC).AIM To comprehensively analyze and characterize all publicly available genomic,gene expression,methylation,miRNA and proteomic data in HCC,covering 85 studies and 3355 patient sample profiles,to identify the key dysregulated genes and pathways they affect.METHODS We collected and curated all well-annotated and publicly available highthroughput datasets from PubMed and Gene Expression Omnibus derived from human HCC tissue.Comprehensive pathway enrichment analysis was performed using pathDIP for each data type(genomic,gene expression,methylation,miRNA and proteomic),and the overlap of pathways was assessed to elucidate pathway dependencies in HCC.RESULTS We identified a total of 8733 abstracts retrieved by the search on PubMed on HCC for the different layers of data on human HCC samples,published until December 2016.The common key dysregulated pathways in HCC tissue across different layers of data included epidermal growth factor(EGFR)andβ1-integrin pathways.Genes along these pathways were significantly and consistently dysregulated across the different types of high-throughput data and had prognostic value with respect to overall survival.Using CTD database,estradiol would best modulate and revert these genes appropriately.CONCLUSION By analyzing and integrating all available high-throughput genomic,transcriptomic,miRNA,methylation and proteomic data from human HCC tissue,we identified EGFR,β1-integrin and axon guidance as pathway dependencies in HCC.These are master regulators of key pathways in HCC,such as the mTOR,Ras/Raf/MAPK and p53 pathways.The genes implicated in these pathways had prognostic value in HCC,with Netrin and Slit3 being novel proteins of prognostic importance to HCC.Based on this integrative analysis,EGFR,andβ1-integrin are master regulators that could serve as potential therapeutic targets in HCC.
文摘RNA-sequencing(RNA-seq),based on next-generation sequencing technologies,has rapidly become a standard and popular technology for transcriptome analysis.However,serious challenges still exist in analyzing and interpreting the RNA-seq data.With the development of high-throughput sequencing technology,the sequencing depth of RNA-seq data increases explosively.The intricate biological process of transcriptome is more complicated and diversified beyond our imagination.Moreover,most of the remaining organisms still have no available reference genome or have only incomplete genome annotations.Therefore,a large number of bioinformatics methods for various transcriptomics studies are proposed to effectively settle these challenges.This review comprehensively summarizes the various studies in RNA-seq data analysis and their corresponding analysis methods,including genome annotation,quality control and pre-processing of reads,read alignment,transcriptome assembly,gene and isoform expression quantification,differential expression analysis,data visualization and other analyses.