- Browse by Author
Browsing by Author "Zhan, Xiaohui"
Now showing 1 - 9 of 9
Results Per Page
Sort Options
Item BrcaSeg: A Deep Learning Approach for Tissue Quantification and Genomic Correlations of Histopathological Images(Elsevier, 2021) Lu, Zixiao; Zhan, Xiaohui; Wu, Yi; Cheng, Jun; Shao, Wei; Ni, Dong; Han, Zhi; Zhang, Jie; Feng, Qianjin; Huang, Kun; Medicine, School of MedicineEpithelial and stromal tissues are components of the tumor microenvironment and play a major role in tumor initiation and progression. Distinguishing stroma from epithelial tissues is critically important for spatial characterization of the tumor microenvironment. Here, we propose BrcaSeg, an image analysis pipeline based on a convolutional neural network (CNN) model to classify epithelial and stromal regions in whole-slide hematoxylin and eosin (H&E) stained histopathological images. The CNN model is trained using well-annotated breast cancer tissue microarrays and validated with images from The Cancer Genome Atlas (TCGA) Program. BrcaSeg achieves a classification accuracy of 91.02%, which outperforms other state-of-the-art methods. Using this model, we generate pixel-level epithelial/stromal tissue maps for 1000 TCGA breast cancer slide images that are paired with gene expression data. We subsequently estimate the epithelial and stromal ratios and perform correlation analysis to model the relationship between gene expression and tissue ratios. Gene Ontology (GO) enrichment analyses of genes that are highly correlated with tissue ratios suggest that the same tissue is associated with similar biological processes in different breast cancer subtypes, whereas each subtype also has its own idiosyncratic biological processes governing the development of these tissues. Taken all together, our approach can lead to new insights in exploring relationships between image-based phenotypes and their underlying genomic events and biological processes for all types of solid tumors. BrcaSeg can be accessed at https://github.com/Serian1992/ImgBio.Item Computational Image Analysis Identifies Histopathological Image Features Associated With Somatic Mutations and Patient Survival in Gastric Adenocarcinoma(Frontiers Media, 2021-03-31) Cheng, Jun; Liu, Yuting; Huang, Wei; Hong, Wenhui; Wang, Lingling; Zhan, Xiaohui; Han, Zhi; Ni, Dong; Huang, Kun; Zhang, Jie; Medicine, School of MedicineComputational analysis of histopathological images can identify sub-visual objective image features that may not be visually distinguishable by human eyes, and hence provides better modeling of disease phenotypes. This study aims to investigate whether specific image features are associated with somatic mutations and patient survival in gastric adenocarcinoma (sample size = 310). An automated image analysis pipeline was developed to extract quantitative morphological features from H&E stained whole-slide images. We found that four frequently somatically mutated genes (TP53, ARID1A, OBSCN, and PIK3CA) were significantly associated with tumor morphological changes. A prognostic model built on the image features significantly stratified patients into low-risk and high-risk groups (log-rank test p-value = 2.6e-4). Multivariable Cox regression showed the model predicted risk index was an additional prognostic factor besides tumor grade and stage. Gene ontology enrichment analysis showed that the genes whose expressions mostly correlated with the contributing features in the prognostic model were enriched on biological processes such as cell cycle and muscle contraction. These results demonstrate that histopathological image features can reflect underlying somatic mutations and identify high-risk patients that may benefit from more precise treatment regimens. Both the image features and pipeline are highly interpretable to enable translational applications.Item Correlation Analysis of Histopathology and Proteogenomics Data for Breast Cancer(American Society for Biochemistry and Molecular Biology, 2019-08-09) Zhan, Xiaohui; Cheng, Jun; Huang, Zhi; Han, Zhi; Helm, Bryan; Liu, Xiaowen; Zhang, Jie; Wang, Tian-Fu; Ni, Dong; Huang, Kun; Medicine, School of MedicineTumors are heterogeneous tissues with different types of cells such as cancer cells, fibroblasts, and lymphocytes. Although the morphological features of tumors are critical for cancer diagnosis and prognosis, the underlying molecular events and genes for tumor morphology are far from being clear. With the advancement in computational pathology and accumulation of large amount of cancer samples with matched molecular and histopathology data, researchers can carry out integrative analysis to investigate this issue. In this study, we systematically examine the relationships between morphological features and various molecular data in breast cancers. Specifically, we identified 73 breast cancer patients from the TCGA and CPTAC projects matched whole slide images, RNA-seq, and proteomic data. By calculating 100 different morphological features and correlating them with the transcriptomic and proteomic data, we inferred four major biological processes associated with various interpretable morphological features. These processes include metabolism, cell cycle, immune response, and extracellular matrix development, which are all hallmarks of cancers and the associated morphological features are related to area, density, and shapes of epithelial cells, fibroblasts, and lymphocytes. In addition, protein specific biological processes were inferred solely from proteomic data, suggesting the importance of proteomic data in obtaining a holistic understanding of the molecular basis for tumor tissue morphology. Furthermore, survival analysis yielded specific morphological features related to patient prognosis, which have a strong association with important molecular events based on our analysis. Overall, our study demonstrated the power for integrating multiple types of biological data for cancer samples in generating new hypothesis as well as identifying potential biomarkers predicting patient outcome. Future work includes causal analysis to identify key regulators for cancer tissue development and validating the findings using more independent data sets.Item Deep learning-based cancer survival prognosis from RNA-seq data: approaches and evaluations(BMC, 2020) Huang, Zhi; Johnson, Travis S.; Han, Zhi; Helm, Bryan; Cao, Sha; Zhang, Chi; Salama, Paul; Rizkalla, Maher; Yu, Christina Y.; Cheng, Jun; Xiang, Shunian; Zhan, Xiaohui; Zhang, Jie; Huang, Kun; Medicine, School of MedicineBackground: Recent advances in kernel-based Deep Learning models have introduced a new era in medical research. Originally designed for pattern recognition and image processing, Deep Learning models are now applied to survival prognosis of cancer patients. Specifically, Deep Learning versions of the Cox proportional hazards models are trained with transcriptomic data to predict survival outcomes in cancer patients. Methods: In this study, a broad analysis was performed on TCGA cancers using a variety of Deep Learning-based models, including Cox-nnet, DeepSurv, and a method proposed by our group named AECOX (AutoEncoder with Cox regression network). Concordance index and p-value of the log-rank test are used to evaluate the model performances. Results: All models show competitive results across 12 cancer types. The last hidden layers of the Deep Learning approaches are lower dimensional representations of the input data that can be used for feature reduction and visualization. Furthermore, the prognosis performances reveal a negative correlation between model accuracy, overall survival time statistics, and tumor mutation burden (TMB), suggesting an association among overall survival time, TMB, and prognosis prediction accuracy. Conclusions: Deep Learning based algorithms demonstrate superior performances than traditional machine learning based models. The cancer prognosis results measured in concordance index are indistinguishable across models while are highly variable across cancers. These findings shedding some light into the relationships between patient characteristics and survival learnability on a pan-cancer level.Item Gene Co-expression Network and Copy Number Variation Analyses Identify Transcription Factors Associated With Multiple Myeloma Progression(Frontiers, 2019-05-17) Yu, Christina Y.; Xiang, Shunian; Huang, Zhi; Johnson, Travis S.; Zhan, Xiaohui; Han, Zhi; Abu Zaid, Mohammad; Huang, Kun; Medicine, School of MedicineMultiple myeloma (MM) has two clinical precursor stages of disease: monoclonal gammopathy of undetermined significance (MGUS) and smoldering multiple myeloma (SMM). However, the mechanism of progression is not well understood. Because gene co-expression network analysis is a well-known method for discovering new gene functions and regulatory relationships, we utilized this framework to conduct differential co-expression analysis to identify interesting transcription factors (TFs) in two publicly available datasets. We then used copy number variation (CNV) data from a third public dataset to validate these TFs. First, we identified co-expressed gene modules in two publicly available datasets each containing three conditions: normal, MGUS, and SMM. These modules were assessed for condition-specific gene expression, and then enrichment analysis was conducted on condition-specific modules to identify their biological function and upstream TFs. TFs were assessed for differential gene expression between normal and MM precursors, then validated with CNV analysis to identify candidate genes. Functional enrichment analysis reaffirmed known functional categories in MM pathology, the main one relating to immune function. Enrichment analysis revealed a handful of differentially expressed TFs between normal and either MGUS or SMM in gene expression and/or CNV. Overall, we identified four genes of interest (MAX, TCF4, ZNF148, and ZNF281) that aid in our understanding of MM initiation and progression.Item Gene Co-Expression Networks Restructured Gene Fusion in Rhabdomyosarcoma Cancers(MDPI, 2019-08-30) Helm, Bryan R.; Zhan, Xiaohui; Pandya, Pankita H.; Murray, Mary E.; Pollok, Karen E.; Renbarger, Jamie L.; Ferguson, Michael J.; Han, Zhi; Ni, Dong; Zhang, Jie; Huang, Kun; Medicine, School of MedicineRhabdomyosarcoma is subclassified by the presence or absence of a recurrent chromosome translocation that fuses the FOXO1 and PAX3 or PAX7 genes. The fusion protein (FOXO1-PAX3/7) retains both binding domains and becomes a novel and potent transcriptional regulator in rhabdomyosarcoma subtypes. Many studies have characterized and integrated genomic, transcriptomic, and epigenomic differences among rhabdomyosarcoma subtypes that contain the FOXO1-PAX3/7 gene fusion and those that do not; however, few investigations have investigated how gene co-expression networks are altered by FOXO1-PAX3/7. Although transcriptional data offer insight into one level of functional regulation, gene co-expression networks have the potential to identify biological interactions and pathways that underpin oncogenesis and tumorigenicity. Thus, we examined gene co-expression networks for rhabdomyosarcoma that were FOXO1-PAX3 positive, FOXO1-PAX7 positive, or fusion negative. Gene co-expression networks were mined using local maximum Quasi-Clique Merger (lmQCM) and analyzed for co-expression differences among rhabdomyosarcoma subtypes. This analysis observed 41 co-expression modules that were shared between fusion negative and positive samples, of which 17/41 showed significant up- or down-regulation in respect to fusion status. Fusion positive and negative rhabdomyosarcoma showed differing modularity of co-expression networks with fusion negative (n = 109) having significantly more individual modules than fusion positive (n = 53). Subsequent analysis of gene co-expression networks for PAX3 and PAX7 type fusions observed 17/53 were differentially expressed between the two subtypes. Gene list enrichment analysis found that gene ontology terms were poorly matched with biological processes and molecular function for most co-expression modules identified in this study; however, co-expressed modules were frequently localized to cytobands on chromosomes 8 and 11. Overall, we observed substantial restructuring of co-expression networks relative to fusion status and fusion type in rhabdomyosarcoma and identified previously overlooked genes and pathways that may be targeted in this pernicious disease.Item A pan-kidney cancer study identifies subtype specific perturbations on pathways with potential drivers in renal cell carcinoma(BMC, 2020-12-28) Zhan, Xiaohui; Liu, Yusong; Yu, Christina Y.; Wang, Tian‑Fu; Zhang, Jie; Ni, Dong; Huang, Kun; Medicine, School of MedicineBackground: Renal cell carcinoma (RCC) is a complex disease and is comprised of several histological subtypes, the most frequent of which are clear cell renal cell carcinoma (ccRCC), papillary renal cell carcinoma (PRCC) and chromophobe renal cell carcinoma (ChRCC). While lots of studies have been performed to investigate the molecular characterizations of different subtypes of RCC, our knowledge regarding the underlying mechanisms are still incomplete. As molecular alterations are eventually reflected on the pathway level to execute certain biological functions, characterizing the pathway perturbations is crucial for understanding tumorigenesis and development of RCC. Methods: In this study, we investigated the pathway perturbations of various RCC subtype against normal tissue based on differential expressed genes within a certain pathway. We explored the potential upstream regulators of subtype-specific pathways with Ingenuity Pathway Analysis (IPA). We also evaluated the relationships between subtype-specific pathways and clinical outcome with survival analysis. Results: In this study, we carried out a pathway-based analysis to explore the mechanisms of various RCC subtypes with TCGA RNA-seq data. Both commonly altered pathways and subtype-specific pathways were detected. To identify the distinctive characteristics of each subtype, we focused on subtype-specific perturbed pathways. Specifically, we observed that some of the altered pathways were regulated by several recurrent upstream regulators which presenting different expression patterns among distinct RCC subtypes. We also noticed that a large number of perturbed pathways were controlled by the subtype-specific upstream regulators. Moreover, we also evaluated the relationships between perturbed pathways and clinical outcome. Prognostic pathways were identified and their roles in tumor development and progression were inferred. Conclusions: In summary, we evaluated the relationships among pathway perturbations, upstream regulators and clinical outcome for differential subtypes in RCC. We hypothesized that the alterations of common upstream regulators as well as subtype-specific upstream regulators work together to affect the downstream pathway perturbations and drive cancer initialization and prognosis. Our findings not only increase our understanding of the mechanisms of various RCC subtypes, but also provide targets for personalized therapeutic intervention.Item SALMON: Survival Analysis Learning With Multi-Omics Neural Networks on Breast Cancer(Frontiers Media, 2019-03-08) Huang, Zhi; Zhan, Xiaohui; Xiang, Shunian; Johnson, Travis S.; Helm, Bryan; Yu, Christina Y.; Zhang, Jie; Salama, Paul; Rizkalla, Maher; Han, Zhi; Huang, Kun; Department of Medicine, Indiana University School of MedicineImproved cancer prognosis is a central goal for precision health medicine. Though many models can predict differential survival from data, there is a strong need for sophisticated algorithms that can aggregate and filter relevant predictors from increasingly complex data inputs. In turn, these models should provide deeper insight into which types of data are most relevant to improve prognosis. Deep Learning-based neural networks offer a potential solution for both problems because they are highly flexible and account for data complexity in a non-linear fashion. In this study, we implement Deep Learning-based networks to determine how gene expression data predicts Cox regression survival in breast cancer. We accomplish this through an algorithm called SALMON (Survival Analysis Learning with Multi-Omics Neural Networks), which aggregates and simplifies gene expression data and cancer biomarkers to enable prognosis prediction. The results revealed improved performance when more omics data were used in model construction. Rather than use raw gene expression values as model inputs, we innovatively use eigengene modules from the result of gene co-expression network analysis. The corresponding high impact co-expression modules and other omics data are identified by feature selection technique, then examined by conducting enrichment analysis and exploiting biological functions, escalated the interpretation of input feature from gene level to co-expression modules level. Our study shows the feasibility of discovering breast cancer related co-expression modules, sketch a blueprint of future endeavors on Deep Learning-based survival analysis. SALMON source code is available at https://github.com/huangzhii/SALMON/.Item TPQCI: A topology potential-based method to quantify functional influence of copy number variations(Elsevier, 2021-08) Liu, Yusong; Ye, Xiufen; Zhan, Xiaohui; Yu, Christina Y.; Zhang, Jie; Huang, Kun; Medical and Molecular Genetics, School of MedicineCopy number variation (CNV) is a major type of chromosomal structural variation that play important roles in many diseases including cancers. Due to genome instability, a large number of CNV events can be detected in diseases such as cancer. Therefore, it is important to identify the functionally important CNVs in diseases, which currently still poses a challenge in genomics. One of the critical steps to solve the problem is to define the influence of CNV. In this paper, we provide a topology potential based method, TPQCI, to quantify this kind of influence by integrating statistics, gene regulatory associations, and biological function information. We used this metric to detect functionally enriched genes on genomic segments with CNV in breast cancer and multiple myeloma and discovered biological functions influenced by CNV. Our results demonstrate that, by using our proposed TPQCI metric, we can detect disease-specific genes that are influenced by CNVs. Source codes of TPQCI are provided in Github (https://github.com/usos/TPQCI).