- Browse by Author
Browsing by Author "Yu, Christina Y."
Now showing 1 - 10 of 12
Results Per Page
Sort Options
Item Condition-specific gene co-expression network mining identifies key pathways and regulators in the brain tissue of Alzheimer's disease patients(Biomed Central, 2018-12-31) Xiang, Shunian; Huang, Zhi; Wang, Tianfu; Han, Zhi; Yu, Christina Y.; Ni, Dong; Huang, Kun; Zhang, Jie; Medicine, School of MedicineBACKGROUND: Gene co-expression network (GCN) mining is a systematic approach to efficiently identify novel disease pathways, predict novel gene functions and search for potential disease biomarkers. However, few studies have systematically identified GCNs in multiple brain transcriptomic data of Alzheimer's disease (AD) patients and looked for their specific functions. METHODS: In this study, we first mined GCN modules from AD and normal brain samples in multiple datasets respectively; then identified gene modules that are specific to AD or normal samples; lastly, condition-specific modules with similar functional enrichments were merged and enriched differentially expressed upstream transcription factors were further examined for the AD/normal-specific modules. RESULTS: We obtained 30 AD-specific modules which showed gain of correlation in AD samples and 31 normal-specific modules with loss of correlation in AD samples compared to normal ones, using the network mining tool lmQCM. Functional and pathway enrichment analysis not only confirmed known gene functional categories related to AD, but also identified novel regulatory factors and pathways. Remarkably, pathway analysis suggested that a variety of viral, bacteria, and parasitic infection pathways are activated in AD samples. Furthermore, upstream transcription factor analysis identified differentially expressed upstream regulators such as ZFHX3 for several modules, which can be potential driver genes for AD etiology and pathology. CONCLUSIONS: Through our state-of-the-art network-based approach, AD/normal-specific GCN modules were identified using multiple transcriptomic datasets from multiple regions of the brain. Bacterial and viral infectious disease related pathways are the most frequently enriched in modules across datasets. Transcription factor ZFHX3 was identified as a potential driver regulator targeting the infectious diseases pathways in AD-specific modules. Our results provided new direction to the mechanism of AD as well as new candidates for drug targets.Item Deep learning-based cancer survival prognosis from RNA-seq data: approaches and evaluations(BMC, 2020) Huang, Zhi; Johnson, Travis S.; Han, Zhi; Helm, Bryan; Cao, Sha; Zhang, Chi; Salama, Paul; Rizkalla, Maher; Yu, Christina Y.; Cheng, Jun; Xiang, Shunian; Zhan, Xiaohui; Zhang, Jie; Huang, Kun; Medicine, School of MedicineBackground: Recent advances in kernel-based Deep Learning models have introduced a new era in medical research. Originally designed for pattern recognition and image processing, Deep Learning models are now applied to survival prognosis of cancer patients. Specifically, Deep Learning versions of the Cox proportional hazards models are trained with transcriptomic data to predict survival outcomes in cancer patients. Methods: In this study, a broad analysis was performed on TCGA cancers using a variety of Deep Learning-based models, including Cox-nnet, DeepSurv, and a method proposed by our group named AECOX (AutoEncoder with Cox regression network). Concordance index and p-value of the log-rank test are used to evaluate the model performances. Results: All models show competitive results across 12 cancer types. The last hidden layers of the Deep Learning approaches are lower dimensional representations of the input data that can be used for feature reduction and visualization. Furthermore, the prognosis performances reveal a negative correlation between model accuracy, overall survival time statistics, and tumor mutation burden (TMB), suggesting an association among overall survival time, TMB, and prognosis prediction accuracy. Conclusions: Deep Learning based algorithms demonstrate superior performances than traditional machine learning based models. The cancer prognosis results measured in concordance index are indistinguishable across models while are highly variable across cancers. These findings shedding some light into the relationships between patient characteristics and survival learnability on a pan-cancer level.Item Diagnostic Evidence GAuge of Single cells (DEGAS): a flexible deep transfer learning framework for prioritizing cells in relation to disease(BMC, 2022-02-01) Johnson, Travis S.; Yu, Christina Y.; Huang, Zhi; Xu, Siwen; Wang, Tongxin; Dong, Chuanpeng; Shao, Wei; Zaid, Mohammad Abu; Huang, Xiaoqing; Wang, Yijie; Bartlett, Christopher; Zhang, Yan; Walker, Brian A.; Liu, Yunlong; Huang, Kun; Zhang, Jie; Medicine, School of MedicineWe propose DEGAS (Diagnostic Evidence GAuge of Single cells), a novel deep transfer learning framework, to transfer disease information from patients to cells. We call such transferrable information "impressions," which allow individual cells to be associated with disease attributes like diagnosis, prognosis, and response to therapy. Using simulated data and ten diverse single-cell and patient bulk tissue transcriptomic datasets from glioblastoma multiforme (GBM), Alzheimer's disease (AD), and multiple myeloma (MM), we demonstrate the feasibility, flexibility, and broad applications of the DEGAS framework. DEGAS analysis on myeloma single-cell transcriptomics identified PHF19high myeloma cells associated with progression.Item Gene Co-expression Network and Copy Number Variation Analyses Identify Transcription Factors Associated With Multiple Myeloma Progression(Frontiers, 2019-05-17) Yu, Christina Y.; Xiang, Shunian; Huang, Zhi; Johnson, Travis S.; Zhan, Xiaohui; Han, Zhi; Abu Zaid, Mohammad; Huang, Kun; Medicine, School of MedicineMultiple myeloma (MM) has two clinical precursor stages of disease: monoclonal gammopathy of undetermined significance (MGUS) and smoldering multiple myeloma (SMM). However, the mechanism of progression is not well understood. Because gene co-expression network analysis is a well-known method for discovering new gene functions and regulatory relationships, we utilized this framework to conduct differential co-expression analysis to identify interesting transcription factors (TFs) in two publicly available datasets. We then used copy number variation (CNV) data from a third public dataset to validate these TFs. First, we identified co-expressed gene modules in two publicly available datasets each containing three conditions: normal, MGUS, and SMM. These modules were assessed for condition-specific gene expression, and then enrichment analysis was conducted on condition-specific modules to identify their biological function and upstream TFs. TFs were assessed for differential gene expression between normal and MM precursors, then validated with CNV analysis to identify candidate genes. Functional enrichment analysis reaffirmed known functional categories in MM pathology, the main one relating to immune function. Enrichment analysis revealed a handful of differentially expressed TFs between normal and either MGUS or SMM in gene expression and/or CNV. Overall, we identified four genes of interest (MAX, TCF4, ZNF148, and ZNF281) that aid in our understanding of MM initiation and progression.Item Integrative analysis of histopathological images and chromatin accessibility data for estrogen receptor-positive breast cancer(BMC, 2020-12-28) Xu, Siwen; Lu, Zixiao; Shao, Wei; Yu, Christina Y.; Reiter, Jill L.; Feng, Qianjin; Feng, Weixing; Huang, Kun; Liu, Yunlong; Medicine, School of MedicineBackground: Existing studies have demonstrated that the integrative analysis of histopathological images and genomic data can be used to better understand the onset and progression of many diseases, as well as identify new diagnostic and prognostic biomarkers. However, since the development of pathological phenotypes are influenced by a variety of complex biological processes, complete understanding of the underlying gene regulatory mechanisms for the cell and tissue morphology is still a challenge. In this study, we explored the relationship between the chromatin accessibility changes and the epithelial tissue proportion in histopathological images of estrogen receptor (ER) positive breast cancer. Methods: An established whole slide image processing pipeline based on deep learning was used to perform global segmentation of epithelial and stromal tissues. We then used canonical correlation analysis to detect the epithelial tissue proportion-associated regulatory regions. By integrating ATAC-seq data with matched RNA-seq data, we found the potential target genes that associated with these regulatory regions. Then we used these genes to perform the following pathway and survival analysis. Results: Using canonical correlation analysis, we detected 436 potential regulatory regions that exhibited significant correlation between quantitative chromatin accessibility changes and the epithelial tissue proportion in tumors from 54 patients (FDR < 0.05). We then found that these 436 regulatory regions were associated with 74 potential target genes. After functional enrichment analysis, we observed that these potential target genes were enriched in cancer-associated pathways. We further demonstrated that using the gene expression signals and the epithelial tissue proportion extracted from this integration framework could stratify patient prognoses more accurately, outperforming predictions based on only omics or image features. Conclusion: This integrative analysis is a useful strategy for identifying potential regulatory regions in the human genome that are associated with tumor tissue quantification. This study will enable efficient prioritization of genomic regulatory regions identified by ATAC-seq data for further studies to validate their causal regulatory function. Ultimately, identifying epithelial tissue proportion-associated regulatory regions will further our understanding of the underlying molecular mechanisms of disease and inform the development of potential therapeutic targets.Item Intron retention-induced neoantigen load correlates with unfavorable prognosis in multiple myeloma(Springer Nature, 2021-10) Dong, Chuanpeng; Cesarano, Annamaria; Bombaci, Giuseppe; Reiter, Jill L.; Yu, Christina Y.; Wang, Yue; Jiang, Zhaoyang; Zaid, Mohammad Abu; Huang, Kun; Lu, Xiongbin; Walker, Brian A.; Perna, Fabiana; Liu, Yunlong; BioHealth Informatics, School of Informatics and ComputingNeoantigen peptides arising from genetic alterations may serve as targets for personalized cancer vaccines and as positive predictors of response to immune checkpoint therapy. Mutations in genes regulating RNA splicing are common in hematological malignancies leading to dysregulated splicing and intron retention (IR). In this study, we investigated IR as a potential source of tumor neoantigens in multiple myeloma (MM) patients and the relationship of IR-induced neoantigens (IR-neoAg) with clinical outcomes. MM-specific IR events were identified in RNA-sequencing data from the Multiple Myeloma Research Foundation CoMMpass study after removing IR events that also occurred in normal plasma cells. We quantified the IR-neoAg load by assessing IR-induced novel peptides that were predicted to bind to major histocompatibility complex (MHC) molecules. We found that high IR-neoAg load was associated with poor overall survival in both newly diagnosed and relapsed MM patients. Further analyses revealed that poor outcome in MM patients with high IR-neoAg load was associated with high expression levels of T-cell co-inhibitory molecules and elevated interferon signaling activity. We also found that MM cells exhibiting high IR levels had lower MHC-II protein abundance and treatment of MM cells with a spliceosome inhibitor resulted in increased MHC-I protein abundance. Our findings suggest that IR-neoAg may represent a novel biomarker of MM patient clinical outcome and further that targeting RNA splicing may serve as a potential therapeutic strategy to prevent MM immune escape and promote response to checkpoint blockade.Item LAmbDA: label ambiguous domain adaptation dataset integration reduces batch effects and improves subtype detection(Oxford Academic, 2019-04) Johnson, Travis S.; Wang, Tongxin; Huang, Zhi; Yu, Christina Y.; Wu, Yi; Han, Yatong; Zhang, Yan; Huang, Kun; Zhang, Jie; Medicine, School of MedicineMotivation Rapid advances in single cell RNA sequencing (scRNA-seq) have produced higher-resolution cellular subtypes in multiple tissues and species. Methods are increasingly needed across datasets and species to (i) remove systematic biases, (ii) model multiple datasets with ambiguous labels and (iii) classify cells and map cell type labels. However, most methods only address one of these problems on broad cell types or simulated data using a single model type. It is also important to address higher-resolution cellular subtypes, subtype labels from multiple datasets, models trained on multiple datasets simultaneously and generalizability beyond a single model type. Results We developed a species- and dataset-independent transfer learning framework (LAmbDA) to train models on multiple datasets (even from different species) and applied our framework on simulated, pancreas and brain scRNA-seq experiments. These models mapped corresponding cell types between datasets with inconsistent cell subtype labels while simultaneously reducing batch effects. We achieved high accuracy in labeling cellular subtypes (weighted accuracy simulated 1 datasets: 90%; simulated 2 datasets: 94%; pancreas datasets: 88% and brain datasets: 66%) using LAmbDA Feedforward 1 Layer Neural Network with bagging. This method achieved higher weighted accuracy in labeling cellular subtypes than two other state-of-the-art methods, scmap and CaSTLe in brain (66% versus 60% and 32%). Furthermore, it achieved better performance in correctly predicting ambiguous cellular subtype labels across datasets in 88% of test cases compared with CaSTLe (63%), scmap (50%) and MetaNeighbor (50%). LAmbDA is model- and dataset-independent and generalizable to diverse data types representing an advance in biocomputing.Item A pan-kidney cancer study identifies subtype specific perturbations on pathways with potential drivers in renal cell carcinoma(BMC, 2020-12-28) Zhan, Xiaohui; Liu, Yusong; Yu, Christina Y.; Wang, Tian‑Fu; Zhang, Jie; Ni, Dong; Huang, Kun; Medicine, School of MedicineBackground: Renal cell carcinoma (RCC) is a complex disease and is comprised of several histological subtypes, the most frequent of which are clear cell renal cell carcinoma (ccRCC), papillary renal cell carcinoma (PRCC) and chromophobe renal cell carcinoma (ChRCC). While lots of studies have been performed to investigate the molecular characterizations of different subtypes of RCC, our knowledge regarding the underlying mechanisms are still incomplete. As molecular alterations are eventually reflected on the pathway level to execute certain biological functions, characterizing the pathway perturbations is crucial for understanding tumorigenesis and development of RCC. Methods: In this study, we investigated the pathway perturbations of various RCC subtype against normal tissue based on differential expressed genes within a certain pathway. We explored the potential upstream regulators of subtype-specific pathways with Ingenuity Pathway Analysis (IPA). We also evaluated the relationships between subtype-specific pathways and clinical outcome with survival analysis. Results: In this study, we carried out a pathway-based analysis to explore the mechanisms of various RCC subtypes with TCGA RNA-seq data. Both commonly altered pathways and subtype-specific pathways were detected. To identify the distinctive characteristics of each subtype, we focused on subtype-specific perturbed pathways. Specifically, we observed that some of the altered pathways were regulated by several recurrent upstream regulators which presenting different expression patterns among distinct RCC subtypes. We also noticed that a large number of perturbed pathways were controlled by the subtype-specific upstream regulators. Moreover, we also evaluated the relationships between perturbed pathways and clinical outcome. Prognostic pathways were identified and their roles in tumor development and progression were inferred. Conclusions: In summary, we evaluated the relationships among pathway perturbations, upstream regulators and clinical outcome for differential subtypes in RCC. We hypothesized that the alterations of common upstream regulators as well as subtype-specific upstream regulators work together to affect the downstream pathway perturbations and drive cancer initialization and prognosis. Our findings not only increase our understanding of the mechanisms of various RCC subtypes, but also provide targets for personalized therapeutic intervention.Item regSNPs-ASB: A Computational Framework for Identifying Allele-Specific Transcription Factor Binding From ATAC-seq Data(Frontiers, 2020-07-29) Xu, Siwen; Feng, Weixing; Lu, Zixiao; Yu, Christina Y.; Shao, Wei; Nakshatri, Harikrishna; Reiter, Jill L.; Gao, Hongyu; Chu, Xiaona; Wang, Yue; Liu, Yunlong; Medical and Molecular Genetics, School of MedicineExpression quantitative trait loci (eQTL) analysis is useful for identifying genetic variants correlated with gene expression, however, it cannot distinguish between causal and nearby non-functional variants. Because the majority of disease-associated SNPs are located in regulatory regions, they can impact allele-specific binding (ASB) of transcription factors and result in differential expression of the target gene alleles. In this study, our aim was to identify functional single-nucleotide polymorphisms (SNPs) that alter transcriptional regulation and thus, potentially impact cellular function. Here, we present regSNPs-ASB, a generalized linear model-based approach to identify regulatory SNPs that are located in transcription factor binding sites. The input for this model includes ATAC-seq (assay for transposase-accessible chromatin with high-throughput sequencing) raw read counts from heterozygous loci, where differential transposase-cleavage patterns between two alleles indicate preferential transcription factor binding to one of the alleles. Using regSNPs-ASB, we identified 53 regulatory SNPs in human MCF-7 breast cancer cells and 125 regulatory SNPs in human mesenchymal stem cells (MSC). By integrating the regSNPs-ASB output with RNA-seq experimental data and publicly available chromatin interaction data from MCF-7 cells, we found that these 53 regulatory SNPs were associated with 74 potential target genes and that 32 (43%) of these genes showed significant allele-specific expression. By comparing all of the MCF-7 and MSC regulatory SNPs to the eQTLs in the Genome-Tissue Expression (GTEx) Project database, we found that 30% (16/53) of the regulatory SNPs in MCF-7 and 43% (52/122) of the regulatory SNPs in MSC were also in eQTL regions. The enrichment of regulatory SNPs in eQTLs indicated that many of them are likely responsible for allelic differences in gene expression (chi-square test, p-value < 0.01). In summary, we conclude that regSNPs-ASB is a useful tool for identifying causal variants from ATAC-seq data. This new computational tool will enable efficient prioritization of genetic variants identified as eQTL for further studies to validate their causal regulatory function. Ultimately, identifying causal genetic variants will further our understanding of the underlying molecular mechanisms of disease and the eventual development of potential therapeutic targets.Item SALMON: Survival Analysis Learning With Multi-Omics Neural Networks on Breast Cancer(Frontiers Media, 2019-03-08) Huang, Zhi; Zhan, Xiaohui; Xiang, Shunian; Johnson, Travis S.; Helm, Bryan; Yu, Christina Y.; Zhang, Jie; Salama, Paul; Rizkalla, Maher; Han, Zhi; Huang, Kun; Department of Medicine, Indiana University School of MedicineImproved cancer prognosis is a central goal for precision health medicine. Though many models can predict differential survival from data, there is a strong need for sophisticated algorithms that can aggregate and filter relevant predictors from increasingly complex data inputs. In turn, these models should provide deeper insight into which types of data are most relevant to improve prognosis. Deep Learning-based neural networks offer a potential solution for both problems because they are highly flexible and account for data complexity in a non-linear fashion. In this study, we implement Deep Learning-based networks to determine how gene expression data predicts Cox regression survival in breast cancer. We accomplish this through an algorithm called SALMON (Survival Analysis Learning with Multi-Omics Neural Networks), which aggregates and simplifies gene expression data and cancer biomarkers to enable prognosis prediction. The results revealed improved performance when more omics data were used in model construction. Rather than use raw gene expression values as model inputs, we innovatively use eigengene modules from the result of gene co-expression network analysis. The corresponding high impact co-expression modules and other omics data are identified by feature selection technique, then examined by conducting enrichment analysis and exploiting biological functions, escalated the interpretation of input feature from gene level to co-expression modules level. Our study shows the feasibility of discovering breast cancer related co-expression modules, sketch a blueprint of future endeavors on Deep Learning-based survival analysis. SALMON source code is available at https://github.com/huangzhii/SALMON/.