- Browse by Author
Browsing by Author "Shao, Wei"
Now showing 1 - 10 of 18
Results Per Page
Sort Options
Item Application of unsupervised deep learning algorithms for identification of specific clusters of chronic cough patients from EMR data(BMC, 2022-04-19) Shao, Wei; Luo, Xiao; Zhang, Zuoyi; Han, Zhi; Chandrasekaran, Vasu; Turzhitsky, Vladimir; Bali, Vishal; Roberts, Anna R.; Metzger, Megan; Baker, Jarod; La Rosa, Carmen; Weaver, Jessica; Dexter, Paul; Huang, Kun; Biostatistics and Health Data Science, School of MedicineBackground: Chronic cough affects approximately 10% of adults. The lack of ICD codes for chronic cough makes it challenging to apply supervised learning methods to predict the characteristics of chronic cough patients, thereby requiring the identification of chronic cough patients by other mechanisms. We developed a deep clustering algorithm with auto-encoder embedding (DCAE) to identify clusters of chronic cough patients based on data from a large cohort of 264,146 patients from the Electronic Medical Records (EMR) system. We constructed features using the diagnosis within the EMR, then built a clustering-oriented loss function directly on embedded features of the deep autoencoder to jointly perform feature refinement and cluster assignment. Lastly, we performed statistical analysis on the identified clusters to characterize the chronic cough patients compared to the non-chronic cough patients. Results: The experimental results show that the DCAE model generated three chronic cough clusters and one non-chronic cough patient cluster. We found various diagnoses, medications, and lab tests highly associated with chronic cough patients by comparing the chronic cough cluster with the non-chronic cough cluster. Comparison of chronic cough clusters demonstrated that certain combinations of medications and diagnoses characterize some chronic cough clusters. Conclusions: To the best of our knowledge, this study is the first to test the potential of unsupervised deep learning methods for chronic cough investigation, which also shows a great advantage over existing algorithms for patient data clustering.Item Artificial intelligence reveals features associated with breast cancer neoadjuvant chemotherapy responses from multi-stain histopathologic images(Springer Nature, 2023-01-27) Huang, Zhi; Shao, Wei; Han, Zhi; Alkashash, Ahmad Mahmoud; De la Sancha, Carlo; Parwani, Anil V.; Nitta, Hiroaki; Hou, Yanjun; Wang, Tongxin; Salama, Paul; Rizkalla, Maher; Zhang, Jie; Huang, Kun; Li, Zaibo; Electrical and Computer Engineering, School of Engineering and TechnologyAdvances in computational algorithms and tools have made the prediction of cancer patient outcomes using computational pathology feasible. However, predicting clinical outcomes from pre-treatment histopathologic images remains a challenging task, limited by the poor understanding of tumor immune micro-environments. In this study, an automatic, accurate, comprehensive, interpretable, and reproducible whole slide image (WSI) feature extraction pipeline known as, IMage-based Pathological REgistration and Segmentation Statistics (IMPRESS), is described. We used both H&E and multiplex IHC (PD-L1, CD8+, and CD163+) images, investigated whether artificial intelligence (AI)-based algorithms using automatic feature extraction methods can predict neoadjuvant chemotherapy (NAC) outcomes in HER2-positive (HER2+) and triple-negative breast cancer (TNBC) patients. Features are derived from tumor immune micro-environment and clinical data and used to train machine learning models to accurately predict the response to NAC in breast cancer patients (HER2+ AUC = 0.8975; TNBC AUC = 0.7674). The results demonstrate that this method outperforms the results trained from features that were manually generated by pathologists. The developed image features and algorithms were further externally validated by independent cohorts, yielding encouraging results, especially for the HER2+ subtype.Item BERMUDA: a novel deep transfer learning method for single-cell RNA sequencing batch correction reveals hidden high-resolution cellular subtypes(BioMed Central, 2019-08-12) Wang, Tongxin; Johnson, Travis S.; Shao, Wei; Lu, Zixiao; Helm, Bryan R.; Zhang, Jie; Huang, Kun; Medical and Molecular Genetics, School of MedicineTo fully utilize the power of single-cell RNA sequencing (scRNA-seq) technologies for identifying cell lineages and bona fide transcriptional signals, it is necessary to combine data from multiple experiments. We present BERMUDA (Batch Effect ReMoval Using Deep Autoencoders), a novel transfer-learning-based method for batch effect correction in scRNA-seq data. BERMUDA effectively combines different batches of scRNA-seq data with vastly different cell population compositions and amplifies biological signals by transferring information among batches. We demonstrate that BERMUDA outperforms existing methods for removing batch effects and distinguishing cell types in multiple simulated and real scRNA-seq datasets.Item BrcaSeg: A Deep Learning Approach for Tissue Quantification and Genomic Correlations of Histopathological Images(Elsevier, 2021) Lu, Zixiao; Zhan, Xiaohui; Wu, Yi; Cheng, Jun; Shao, Wei; Ni, Dong; Han, Zhi; Zhang, Jie; Feng, Qianjin; Huang, Kun; Medicine, School of MedicineEpithelial and stromal tissues are components of the tumor microenvironment and play a major role in tumor initiation and progression. Distinguishing stroma from epithelial tissues is critically important for spatial characterization of the tumor microenvironment. Here, we propose BrcaSeg, an image analysis pipeline based on a convolutional neural network (CNN) model to classify epithelial and stromal regions in whole-slide hematoxylin and eosin (H&E) stained histopathological images. The CNN model is trained using well-annotated breast cancer tissue microarrays and validated with images from The Cancer Genome Atlas (TCGA) Program. BrcaSeg achieves a classification accuracy of 91.02%, which outperforms other state-of-the-art methods. Using this model, we generate pixel-level epithelial/stromal tissue maps for 1000 TCGA breast cancer slide images that are paired with gene expression data. We subsequently estimate the epithelial and stromal ratios and perform correlation analysis to model the relationship between gene expression and tissue ratios. Gene Ontology (GO) enrichment analyses of genes that are highly correlated with tissue ratios suggest that the same tissue is associated with similar biological processes in different breast cancer subtypes, whereas each subtype also has its own idiosyncratic biological processes governing the development of these tissues. Taken all together, our approach can lead to new insights in exploring relationships between image-based phenotypes and their underlying genomic events and biological processes for all types of solid tumors. BrcaSeg can be accessed at https://github.com/Serian1992/ImgBio.Item Computational analysis of pathological images enables a better diagnosis of TFE3 Xp11.2 translocation renal cell carcinoma(Nature Research, 2020) Cheng, Jun; Han, Zhi; Mehra, Rohit; Shao, Wei; Cheng, Michael; Feng, Qianjin; Ni, Dong; Huang, Kun; Cheng, Liang; Zhang, Jie; Medicine, School of MedicineTFE3 Xp11.2 translocation renal cell carcinoma (TFE3-RCC) generally progresses more aggressively compared with other RCC subtypes, but it is challenging to diagnose TFE3-RCC by traditional visual inspection of pathological images. In this study, we collect hematoxylin and eosin- stained histopathology whole-slide images of 74 TFE3-RCC cases (the largest cohort to date) and 74 clear cell RCC cases (ccRCC, the most common RCC subtype) with matched gender and tumor grade. An automatic computational pipeline is implemented to extract image features. Comparative study identifies 52 image features with significant differences between TFE3-RCC and ccRCC. Machine learning models are built to distinguish TFE3-RCC from ccRCC. Tests of the classification models on an external validation set reveal high accuracy with areas under ROC curve ranging from 0.842 to 0.894. Our results suggest that automatically derived image features can capture subtle morphological differences between TFE3-RCC and ccRCC and contribute to a potential guideline for TFE3-RCC diagnosis.Item Deep-Learning–Based Characterization of Tumor-Infiltrating Lymphocytes in Breast Cancers From Histopathology Images and Multiomics Data(American Society of Clinical Oncology, 2020-05) Lu, Zixiao; Xu, Siwen; Shao, Wei; Wu, Yi; Zhang, Jie; Han, Zhi; Feng, Qianjin; Huang, Kun; Medicine, School of MedicinePurpose: Tumor-infiltrating lymphocytes (TILs) and their spatial characterizations on whole-slide images (WSIs) of histopathology sections have become crucial in diagnosis, prognosis, and treatment response prediction for different cancers. However, fully automatic assessment of TILs on WSIs currently remains a great challenge because of the heterogeneity and large size of WSIs. We present an automatic pipeline based on a cascade-training U-net to generate high-resolution TIL maps on WSIs. Methods: We present global cell-level TIL maps and 43 quantitative TIL spatial image features for 1,000 WSIs of The Cancer Genome Atlas patients with breast cancer. For more specific analysis, all the patients were divided into three subtypes, namely, estrogen receptor (ER)-positive, ER-negative, and triple-negative groups. The associations between TIL scores and gene expression and somatic mutation were examined separately in three breast cancer subtypes. Both univariate and multivariate survival analyses were performed on 43 TIL image features to examine the prognostic value of TIL spatial patterns in different breast cancer subtypes. Results: The TIL score was in strong association with immune response pathway and genes (eg, programmed death-1 and CLTA4). Different breast cancer subtypes showed TIL score in association with mutations from different genes suggesting that different genetic alterations may lead to similar phenotypes. Spatial TIL features that represent density and distribution of TIL clusters were important indicators of the patient outcomes. Conclusion: Our pipeline can facilitate computational pathology-based discovery in cancer immunology and research on immunotherapy. Our analysis results are available for the research community to generate new hypotheses and insights on breast cancer immunology and development.Item Diagnostic Evidence GAuge of Single cells (DEGAS): a flexible deep transfer learning framework for prioritizing cells in relation to disease(BMC, 2022-02-01) Johnson, Travis S.; Yu, Christina Y.; Huang, Zhi; Xu, Siwen; Wang, Tongxin; Dong, Chuanpeng; Shao, Wei; Zaid, Mohammad Abu; Huang, Xiaoqing; Wang, Yijie; Bartlett, Christopher; Zhang, Yan; Walker, Brian A.; Liu, Yunlong; Huang, Kun; Zhang, Jie; Medicine, School of MedicineWe propose DEGAS (Diagnostic Evidence GAuge of Single cells), a novel deep transfer learning framework, to transfer disease information from patients to cells. We call such transferrable information "impressions," which allow individual cells to be associated with disease attributes like diagnosis, prognosis, and response to therapy. Using simulated data and ten diverse single-cell and patient bulk tissue transcriptomic datasets from glioblastoma multiforme (GBM), Alzheimer's disease (AD), and multiple myeloma (MM), we demonstrate the feasibility, flexibility, and broad applications of the DEGAS framework. DEGAS analysis on myeloma single-cell transcriptomics identified PHF19high myeloma cells associated with progression.Item Integrative analysis of histopathological images and chromatin accessibility data for estrogen receptor-positive breast cancer(BMC, 2020-12-28) Xu, Siwen; Lu, Zixiao; Shao, Wei; Yu, Christina Y.; Reiter, Jill L.; Feng, Qianjin; Feng, Weixing; Huang, Kun; Liu, Yunlong; Medicine, School of MedicineBackground: Existing studies have demonstrated that the integrative analysis of histopathological images and genomic data can be used to better understand the onset and progression of many diseases, as well as identify new diagnostic and prognostic biomarkers. However, since the development of pathological phenotypes are influenced by a variety of complex biological processes, complete understanding of the underlying gene regulatory mechanisms for the cell and tissue morphology is still a challenge. In this study, we explored the relationship between the chromatin accessibility changes and the epithelial tissue proportion in histopathological images of estrogen receptor (ER) positive breast cancer. Methods: An established whole slide image processing pipeline based on deep learning was used to perform global segmentation of epithelial and stromal tissues. We then used canonical correlation analysis to detect the epithelial tissue proportion-associated regulatory regions. By integrating ATAC-seq data with matched RNA-seq data, we found the potential target genes that associated with these regulatory regions. Then we used these genes to perform the following pathway and survival analysis. Results: Using canonical correlation analysis, we detected 436 potential regulatory regions that exhibited significant correlation between quantitative chromatin accessibility changes and the epithelial tissue proportion in tumors from 54 patients (FDR < 0.05). We then found that these 436 regulatory regions were associated with 74 potential target genes. After functional enrichment analysis, we observed that these potential target genes were enriched in cancer-associated pathways. We further demonstrated that using the gene expression signals and the epithelial tissue proportion extracted from this integration framework could stratify patient prognoses more accurately, outperforming predictions based on only omics or image features. Conclusion: This integrative analysis is a useful strategy for identifying potential regulatory regions in the human genome that are associated with tumor tissue quantification. This study will enable efficient prioritization of genomic regulatory regions identified by ATAC-seq data for further studies to validate their causal regulatory function. Ultimately, identifying epithelial tissue proportion-associated regulatory regions will further our understanding of the underlying molecular mechanisms of disease and inform the development of potential therapeutic targets.Item Machine Learning Based Classification from Whole-Slide Histopathological Images Enables Reliable and Interpretable Diagnosis of Inverted Urothelial Papilloma(Elsevier, 2021-11-05) Shao, Wei; Cheng, Michael; Huang, Zhi; Han, Zhi; Wang, Tongxin; Lopez-Beltran, Antonio; Osunkoya, Adeboye O.; Zhang, Jie; Cheng, Liang; Huang, Kun; Medicine, School of MedicineInverted urothelial papilloma (IUP) is a benign neoplasm of the urinary tract that accounts for less than 1% of urothelial tumors. It is diagnostically challenging for pathologists to distinguish histological features of IUP from other subtypes of non-invasive urothelial carcinoma, such as inverted Ta urothelial carcinoma (UCInv) and low-grade Ta urothelial carcinoma (UCLG). Using a machine learning approach, we analyzed the H&E-stained whole-slide histopathological images of 64 IUP (the largest cohort to date), 69 UCInv, and 92 UCLG samples, and propose a reliable, reproducible, and interpretable machine learning pipeline to classify IUP from other non-invasive urothelial carcinomas. The results showed that our method could achieve area under the ROC of 0.913 and 0.920 for classifying IUP from UCInv and UCLG, respectively, which is superior to competing methods, including deep learning-based methods. Testing of the classification models on an external validation dataset confirmed that our model can effectively identify IUP with high accuracy. Our results suggest that the proposed machine learning pipeline can robustly and accurately capture histopathological differences between IUP and other urothelial carcinoma subtypes, which can be extended to identify other rare cancer subtypes with limited samples and has the potential to expand the clinician’s armamentarium for accurate diagnosis.Item MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification(Springer Nature, 2021-06-08) Wang, Tongxin; Shao, Wei; Huang, Zhi; Tang, Haixu; Zhang, Jie; Ding, Zhengming; Huang, Kun; Medicine, School of MedicineTo fully utilize the advances in omics technologies and achieve a more comprehensive understanding of human diseases, novel computational methods are required for integrative analysis of multiple types of omics data. Here, we present a novel multi-omics integrative method named Multi-Omics Graph cOnvolutional NETworks (MOGONET) for biomedical classification. MOGONET jointly explores omics-specific learning and cross-omics correlation learning for effective multi-omics data classification. We demonstrate that MOGONET outperforms other state-of-the-art supervised multi-omics integrative analysis approaches from different biomedical classification applications using mRNA expression data, DNA methylation data, and microRNA expression data. Furthermore, MOGONET can identify important biomarkers from different omics data types related to the investigated biomedical problems.