- Browse by Author
Browsing by Author "Yan, Jingwen"
Now showing 1 - 10 of 74
Results Per Page
Sort Options
Item Biomarker-And Pathway-Informed Polygenic Risk Scores for Alzheimer's Disease and Related Disorders(2022-05) Chasioti, Danai; Yan, Jingwen; Saykin, Andrew J.; Nho, Kwangsik; Risacher, Shannon L.; Wu, HuanmeiDetermining an individual’s genetic susceptibility in complex diseases like Alzheimer’s disease (AD) is challenging as multiple variants each contribute a small portion of the overall risk. Polygenic Risk Scores (PRS) are a mathematical construct or composite that aggregates the small effects of multiple variants into a single score. Potential applications of PRS include risk stratification, biomarker discovery and increased prognostic accuracy. A systematic review demonstrated that methodological refinement of PRS is an active research area, mostly focused on large case-control genome-wide association studies (GWAS). In AD, where there is considerable phenotypic and genetic heterogeneity, we hypothesized that PRS based on endophenotypes, and pathway-relevant genetic information would be particularly informative. In the first study, data from the NIA Alzheimer’s Disease Neuroimaging Initiative (ADNI) was used to develop endophenotype-based PRS based on amyloid (A), tau (T), neurodegeneration (N) and cerebrovascular (V) biomarkers, as well as an overall/combined endophenotype-PRS. Results indicated that combined phenotype-PRS predicted neurodegeneration biomarkers and overall AD risk. By contrast, amyloid and tau-PRSs were strongly linked to the corresponding biomarkers. Finally, extrinsic significance of the PRS approach was demonstrated by application of AD biological pathway-informed PRS to prediction of cognitive changes among older women with breast cancer (BC). Results from PRS analysis of the multicenter Thinking and Living with Cancer (TLC) study indicated that older BC patients with high AD genetic susceptibility within the immune-response and endocytosis pathways have worse cognition following chemotherapy±hormonal therapy rather than hormonal-only therapy. In conclusion, PRSs based on biomarker- or pathway- specific genetic information may provide mechanistic insights beyond disease susceptibility, supporting development of precision medicine with potential application to AD and other age-associated cognitive disorders.Item Brain-wide structural connectivity alterations under the control of Alzheimer risk genes(Inderscience, 2020) Yan, Jingwen; Raja V, Vinesh; Huang, Zhi; Amico, Enrico; Nho, Kwangsik; Fang, Shiaofen; Sporns, Olaf; Wu, Yu-chien; Saykin, Andrew; Goni, Joaquin; Shen, Li; BioHealth Informatics, School of Informatics and ComputingBackground: Alzheimer's disease is the most common form of brain dementia characterized by gradual loss of memory followed by further deterioration of other cognitive function. Large-scale genome-wide association studies have identified and validated more than 20 AD risk genes. However, how these genes are related to the brain-wide breakdown of structural connectivity in AD patients remains unknown. Methods: We used the genotype and DTI data in the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. After constructing the brain network for each subject, we extracted three types of link measures, including fiber anisotropy, fiber length and density. We then performed a targeted genetic association analysis of brain-wide connectivity measures using general linear regression models. Age at scan and gender were included in the regression model as covariates. For fair comparison of the genetic effect on different measures, fiber anisotropy, fiber length and density were all normalized with mean as 0 and standard deviation as one.We aim to discover the abnormal brain-wide network alterations under the control of 34 AD risk SNPs identified in previous large-scale genome-wide association studies. Results: After enforcing the stringent Bonferroni correction, rs10498633 in SLC24A4 were found to significantly associated with anisotropy, total number and length of fibers, including some connecting brain hemispheres. With a lower level of significance at 5e-6, we observed significant genetic effect of SNPs in APOE, ABCA7, EPHA1 and CASS4 on various brain connectivity measures.Item Celltyper: A Single-Cell Sequencing Marker Gene Tool Suite(2023-05) Paisley, Brianna Meadow; Liu, Yunlong; Yan, Jingwen; Cao, Sha; Wang, Juexin; Carfagna, MarkSingle-cell RNA-sequencing (scRNA-seq) has enabled researchers to study interindividual cellular heterogeneity, to explore disease impact on cellular composition of tissue, and to identify novel cell subtypes. However, a major challenge in scRNA-seq analysis is to identify the cell type of individual cells. Accurate cell type identification is crucial for any scRNA-seq analysis to be valid as incorrect cell type assignment will reduce statistical robustness and may lead to incorrect biological conclusions. Therefore, accurate and comprehensive cell type assignment is necessary for reliable biological insights into scRNA-seq datasets. With over 200 distinct cell types in humans alone, the concept of cell identity is large. Even within the same cell type there exists heterogeneity due to cell cycle phase, cell state, cell subtypes, cell health and the tissue microenvironment. This makes cell type classification a complicated biological problem requiring bioinformatics. One approach to classify cell type identity is using marker genes. Marker genes are genes specific for one or a few cell types. When coupled with bioinformatic methods, marker genes show promise of improving cell type classification. However, current scRNA-seq classification methods and databases use marker genes that are non-specific across sources, samples, and/or species leading to bias and errors. Furthermore, many existing tools require manual intervention by the user to provide training datasets or the expected number and name of cell types, which can introduce selection bias. The selection bias negatively impacts the accuracy of cell type classification methods as the model cannot extrapolate outside of the user inputs even when it is biologically meaningful to do so. In this dissertation I developed CellTypeR, a suite of tools to explore the biology governing cell identity in a “normal” state for humans and mice. The work presented here accomplishes three aims: 1. Develop an ontology standardized database of published marker gene literature; 2. Develop and apply a marker gene classification algorithm; and 3. Create user interface and input data structure for scRNA-seq cell type prediction.Item A Comprehensive Survey and Deep Learning-Based Prediction on G-quadruplex Formation and Biological Functions(2022-09) Fang, Shuyi; Wan, Jun; Liu, Yunlong; Yan, Jingwen; Zhang, JieThe G-quadruplexes (G4s) are guanine-rich four-stranded DNA/RNA structures, which have been found throughout the human genome. G4s have been reported to affect chromatin structure and are involved in important biological processes at transcriptional and epigenetic levels. However, the underlying molecular mechanisms and locating of G4 still remain elusive due to the complexity of G4s. Taking advantage of the development of high-throughput sequencing technologies and machine learning approaches, we constructed this comprehensive investigation on G4 structures, including discovery of a novel marker for functional human hematopoietic stem cells and gained interest in G4 structure, exploring association between G4 and genomic factors by incorporating multi-omics data, and development of a deep-learningbased G4 prediction tool with G4 motif. First, we discovered ADGRG1 as a novel marker for functional human hematopoietic stem cells and its regulation through transcription activities. Our interest in G4s was stimulated while the transcription-related investigations. Next, we analyzed the genome-wide distribution properties of G4s and uncovered the associations of G4 with other epigenetic and transcriptional mechanisms to coordinate gene transcription. We explored that different-confidence G4 groups correlated differently with epigenetic regulatory elements and revealed that G4 structures could correlate with gene expression in two opposite ways depending on their locations and forming strands. Some transcription factors were identified to be over-represented with G4 emergence. We found distinct consensus sequences enriched in the G4 feet, with a high GC content in the feet of high-confidence G4s and a high TA content in solely predicted G4 feet. As for the last part, we developed a novel deep-learning-based prediction tool for DNA G4s with G4 motifs. Considering the classical G4 motif, we applied bi-directional LSTM model with attention method, which captures sequential information, and showed good performance in whole-genome level prediction of DNA G4s with the certified G4 pattern. Our comprehensive work investigated G4 with its functions and predictions and provided a better understanding of G4s on multi-omics level and computational information capture riding the wave of deep learning.Item Computational Methods for Determining RNA-RNA Interactions(2023-06) Schaeper, David; Janga, Sarath Chandra; Yan, Jingwen; Srivastava, MansiRNA molecules play vital roles in both viruses and cells, and one way to study their function is through the RNA-RNA interactions (RRIs) that occur. RRIs form in one of two ways, through protein mediated RRIs, where a protein brings the RNA molecules together, or through direct complimentary base pairing between the molecules, called RNA centric. Protein mediated RRIs have been captured and analyzed through experimental protocols such as cross-linking ligation and sequencing of hybrids (CLASH) and mapping RNA interactome in vivo (MARIO). RNA centric interactions have been investigated through experimental protocols ligation of interacting RNA followed by high-throughput sequencing (LIGR-seq), sequencing of psoralen crosslinked, ligated, selected hybrids (SPLASH), psoralen analysis of RNA interactions and structures (PARIS), and cross-linking of matched RNAs and deep sequencing (COMRADES). There are also tools that have been developed to predict RRIs and the predominant tools, RNAup and IntaRNA, utilize minimum free energy (MFE) calculations. In this work, initially RRIs were studied in the context of SARS-CoV-2 and its variants to observe evolutionary changes to RRIs. Using in silico RRIs generated through the COMRADES protocol by Ziv et al alongside computational predictions generated through IntaRNA and a large population of SARS-CoV-2 sequences, covariation analysis was used on the population stratified by variants to determine variant-specific evolutionary changes for certain long-range RRIs. Also, statistical evidence was found for a novel Beta variant specific RNA-RNA interaction. After this, RRIs were studied in the human HEK293T cell line through a novel experimental protocol using Oxford Nanopore long-read sequencing technology to be able to capture more complete information on RRIs mapped with the newly developed pipeline Alignment of Chimera through Clustering and Read Splitting (ACCRES). Through this, multi-molecule RNA interactions were able to be detected using an iterative BLAST approach, which is the first time these have been reported to our knowledge. Interaction interfaces were quantified, and the interactions were characterized by their biotype to understand the landscape of these interactions in the cell line. A network was built, and functional enrichment performed to show the interplay between known functions in the cell.Item Computational Methods for Proteoform Identification and Characterization Using Top-Down Mass Spectrometry(2023-12) Chen, Wenrong; Yan, Jingwen; Wang, Juexin; Wan, Jun; Zang, Yong; Luo, Xiao; Liu, XiaowenProteoforms, distinct molecular forms of proteins, arise due to numerous factors such as genetic mutations, differential gene expression, alternative splicing, and a range of biological processes. These proteoforms are often characterized by primary structural variances such as amino acid substitutions, terminal truncations, and post-translational modifications (PTMs). Proteoforms from the same proteins can manifest varied functional behaviors based on the specific alterations. The complexity inherent to proteoforms has elevated the significance of top-down mass spectrometry (MS) due to its proficiency in providing intricate sequence information for these intact proteoforms. During a typical top-down MS experiment, intact proteoforms are separated through platforms like liquid chromatography (LC) or capillary zone electrophoresis (CZE) prior to tandem mass spectrometry (MS/MS) analysis. Despite advancements in instruments and protocols for top-down MS, computational challenges persist, with software tool development still in its early stage. In this dissertation, our research revolves around three primary goals, all aimed at refining proteoform characterization. First, we bridge RNA-Seq with top-down MS for a better proteoform identification. We propose TopPG, an innovative proteogenomic tool which is tailored to generate proteoform sequence databases from genetic and splicing variations explicitly for top-down MS in contrast to traditional approaches. Second, to boost the accuracy of proteoform detection, we utilize machine learning methods to predict proteoform retention and migration times in top-down MS, an area previously overshadowed by bottom-up MS paradigms. critically evaluating models in a realm traditionally dominated by bottom-up MS methodologies. Lastly, recognizing the indispensable role of post-translational modifications (PTMs) on cellular functions, we introduce PTM-TBA. This tool integrates the complementary strengths of both top-down and bottom-up MS, augmented with annotations, building a comprehensive strategy for precise PTM identification and localization.Item Computational Modeling of Cell and Tissue Level Metabolic Characterization of the Human Metabolic Network by Using scRNA-seq Data(2022-06) Alghamdi, Norah Saeed; Zhang, Chi; Cao, Sha; Yan, Jingwen; Jones, JosetteThe heterogeneity of metabolic pathways is a hallmark of many common disease types. Nowadays, there are several sources of knowledge on the core components of metabolic networks and sub-networks we have obtained, however, there are still limitations in our knowledge of the integrated behavior and metabolic reprogramming of cells microenvironment. Basically, the metabolic changes can be characterized by different factors, and the changes are different from one cell to another cell because of their high plasticity. The large amount of single-cell and tissue data gained from disease tissue has the potential to provide information on a cell functioning state and its underlying phenotypic changes. Hence, advanced systems biology models and computational tools are in pressing need to empower reliable characterization of metabolic variations in disease by using scRNA-seq data. Our preliminary data include (1) a new computational method to estimate cell-wise metabolic flux and states from single-cell and tissue transcriptomics data, and (2) matched scRNA-seq data and metabolomics experiment on cells under perturbed biochemical conditions and knock-down of metabolic genes, both of which form the computational and experimental foundations of this project. In this dissertation, we proposed to develop a suite of novel computational methods, systems biology models, and quantitative metrics to bring the following unmet capabilities: (1) reconstruction of context-specific and subcellular-resolution metabolic network for different disease types, (2) estimation of cell-/sample-wise metabolic flux by considering metabolic imbalance, metabolic exchange between cells in the disease microenvironment, (3) a systematic evaluation of the functional impact of variations in gene expression, metabolite availability and network structure on the context-specific metabolic network and flux. By implementing these methods using scRNA-seq data, we addressed the following outstanding biological questions: (i) identification of genes, metabolites, and network topology with high impact on metabolic variations, (ii) estimation of metabolic flux, and (iv) assessment of metabolic changes over metabolic network. Successful execution of the proposed research provides a suite of computational capabilities to analyze metabolic variations that could be broadly utilized by the biomedical research community.Item Deciphering Gene Regulatory Mechanisms Through Multi-omics Integration(2022-09) Chen, Duojiao; Liu, Yunlong; Wan, Jun; Zhang, Chi; Yan, JingwenComplex biological systems are composed of many regulatory components, which can be measured with the advent of genomics technology. Each molecular assay is normally designed to interrogate one aspect of the cell state. However, a comprehensive understanding of the regulatory mechanism requires characterization from multiple levels such as genome, epigenome, and transcriptome. Integration of multi-omics data is urgently needed for understanding the global regulatory mechanism of gene expression. In recent years, single-cell technology offers unprecedented resolution for a deeper characterization of cellular diversity and states. High-quality single-cell suspensions from tissue biopsies are required for single-cell sequencing experiments. Tissue biopsies need to be processed as soon as being collected to avoid gene expression changes and RNA degradation. Although cryopreservation is a feasible solution to preserve freshly isolated samples, its effect on transcriptome profiles still needs to be investigated. Investigation of multi-omics data at the single-cell level can provide new insights into the biological process. In addition to the common method of integrating multi-omics data, it is also capable of simultaneously profiling the transcriptome and epigenome at single-cell resolution, enhancing the power of discovering new gene regulatory interactions. In this dissertation, we integrated bulk RNA-seq with ATAC-seq and several additional assays and revealed the complex mechanisms of ER–E2 interaction with nucleosomes. A comparison analysis was conducted for comparing fresh and frozen multiple myeloma single-cell RNA sequencing data and concluded that cryopreservation is a feasible protocol for preserving cells. We also analyzed the single-cell multiome data for mesenchymal stem cells. With the unified landscape from simultaneously profiling gene expression and chromatin accessibility, we discovered distinct osteogenic differentiation potential of mesenchymal stem cells and different associations with bone disease-related traits. We gained a deeper insight into the underlying gene regulatory mechanisms with this frontier single-cell mutliome sequencing technique.Item Deciphering the tissue-specific functional effect of Alzheimer risk SNPs with deep genome annotation(Research Square, 2024-02-08) Pugalenthi, Pradeep Varathan; He, Bing; Xie, Linhui; Nho, Kwangsik; Saykin, Andrew J.; Yan, Jingwen; Radiology and Imaging Sciences, School of MedicineAlzheimer’s disease (AD) is a highly heritable brain dementia, along with substantial failure of cognitive function. Large-scale genome-wide association studies (GWASs) have led to a significant set of SNPs associated with AD and related traits. GWAS hits usually emerge as clusters where a lead SNP with the highest significance is surrounded by other less significant neighboring SNPs. Although functionality is not guaranteed even with the strongest associations in GWASs, lead SNPs have historically been the focus of the field, with the remaining associations inferred to be redundant. Recent deep genome annotation tools enable the prediction of function from a segment of a DNA sequence with significantly improved precision, which allows in-silico mutagenesis to interrogate the functional effect of SNP alleles. In this project, we explored the impact of top AD GWAS hits on chromatin functions and whether it will be altered by the genetic context (i.e., alleles of neighboring SNPs). Our results showed that highly correlated SNPs in the same LD block could have distinct impacts on downstream functions. Although some GWAS lead SNPs showed dominant functional effects regardless of the neighborhood SNP alleles, several other SNPs did exhibit enhanced loss or gain of function under certain genetic contexts, suggesting potential additional information hidden in the LD blocks.Item Deep trans-omic network fusion reveals altered synaptic network in Alzheimer’s Disease(CSH, 2023-02-21) Xie, Linhui; Raj, Yash; Varathan, Pradeep; He, Bing; Nho, Kwangsik; Risacher, Shannon L.; Salama, Paul; Saykin, Andrew J.; Yan, Jingwen; Electrical and Computer Engineering, School of Engineering and TechnologyMulti-omic data spanning from genotype, gene expression to protein expression have been increasingly explored to interpret findings from genome wide association studies of Alzheimer’s disease (AD) and to gain more insight of the disease mechanism. However, each -omics data type is usually examined individually and the functional interactions between genetic variations, genes and proteins are only used after discovery to interpret the findings, but not beforehand. In this case, multi-omic findings are likely not functionally related and therefore give rise to challenges in interpretation. To address this problem, we propose a new interpretable deep neural network model MoFNet to jointly model the prior knowledge of functional interactions and multi-omic data set. It aims to identify a subnetwork of functional interactions predictive of AD evidenced by multi-omic measures. Particularly, prior functional interaction network was embedded into the architecture of MoFNet in a way that it resembles the information flow from DNA to gene and protein. The proposed model MoFNet significantly outperformed all other state-of-art classifiers when evaluated using multi-omic data from the ROS/MAP cohort. Instead of individual markers, MoFNet yielded multi-omic sub-networks related to innate immune system, clearance of misfolded proteins, and neurotransmitter release respectively. Around 50% of these findings were replicated in another independent cohort. Our identified gene/proteins are highly related to synaptic vesicle function. Altered regulation or expression of these genes/proteins could cause disruption in neuron-neuron or neuron-glia cross talk and further lead to neuronal and synapse loss in AD. Further investigation of these identified genes/proteins could possibly help decipher the mechanisms underlying synaptic dysfunction in AD, and ultimately inform therapeutic strategies to modify AD progression in the early stage.