- Browse by Author
Browsing by Author "Wang, Tongxin"
Now showing 1 - 10 of 13
Results Per Page
Sort Options
Item Artificial intelligence reveals features associated with breast cancer neoadjuvant chemotherapy responses from multi-stain histopathologic images(Springer Nature, 2023-01-27) Huang, Zhi; Shao, Wei; Han, Zhi; Alkashash, Ahmad Mahmoud; De la Sancha, Carlo; Parwani, Anil V.; Nitta, Hiroaki; Hou, Yanjun; Wang, Tongxin; Salama, Paul; Rizkalla, Maher; Zhang, Jie; Huang, Kun; Li, Zaibo; Electrical and Computer Engineering, School of Engineering and TechnologyAdvances in computational algorithms and tools have made the prediction of cancer patient outcomes using computational pathology feasible. However, predicting clinical outcomes from pre-treatment histopathologic images remains a challenging task, limited by the poor understanding of tumor immune micro-environments. In this study, an automatic, accurate, comprehensive, interpretable, and reproducible whole slide image (WSI) feature extraction pipeline known as, IMage-based Pathological REgistration and Segmentation Statistics (IMPRESS), is described. We used both H&E and multiplex IHC (PD-L1, CD8+, and CD163+) images, investigated whether artificial intelligence (AI)-based algorithms using automatic feature extraction methods can predict neoadjuvant chemotherapy (NAC) outcomes in HER2-positive (HER2+) and triple-negative breast cancer (TNBC) patients. Features are derived from tumor immune micro-environment and clinical data and used to train machine learning models to accurately predict the response to NAC in breast cancer patients (HER2+ AUC = 0.8975; TNBC AUC = 0.7674). The results demonstrate that this method outperforms the results trained from features that were manually generated by pathologists. The developed image features and algorithms were further externally validated by independent cohorts, yielding encouraging results, especially for the HER2+ subtype.Item BERMUDA: a novel deep transfer learning method for single-cell RNA sequencing batch correction reveals hidden high-resolution cellular subtypes(BioMed Central, 2019-08-12) Wang, Tongxin; Johnson, Travis S.; Shao, Wei; Lu, Zixiao; Helm, Bryan R.; Zhang, Jie; Huang, Kun; Medical and Molecular Genetics, School of MedicineTo fully utilize the power of single-cell RNA sequencing (scRNA-seq) technologies for identifying cell lineages and bona fide transcriptional signals, it is necessary to combine data from multiple experiments. We present BERMUDA (Batch Effect ReMoval Using Deep Autoencoders), a novel transfer-learning-based method for batch effect correction in scRNA-seq data. BERMUDA effectively combines different batches of scRNA-seq data with vastly different cell population compositions and amplifies biological signals by transferring information among batches. We demonstrate that BERMUDA outperforms existing methods for removing batch effects and distinguishing cell types in multiple simulated and real scRNA-seq datasets.Item Development and interpretation of a pathomics-based model for the prediction of microsatellite instability in Colorectal Cancer(Ivyspring International Publisher, 2020-09-02) Cao, Rui; Yang, Fan; Ma, Si-Cong; Liu, Li; Zhao, Yu; Li, Yan; Wu, De-Hua; Wang, Tongxin; Lu, Wei-Jia; Cai, Wei-Jing; Zhu, Hong-Bo; Guo, Xue-Jun; Lu, Yu-Wen; Kuang, Jun-Jie; Huan, Wen-Jing; Tang, Wei-Min; Huang, Kun; Huang, Junzhou; Yao, Jianhua; Dong, Zhong-Yi; Biostatistics, School of Public HealthMicrosatellite instability (MSI) has been approved as a pan-cancer biomarker for immune checkpoint blockade (ICB) therapy. However, current MSI identification methods are not available for all patients. We proposed an ensemble multiple instance deep learning model to predict microsatellite status based on histopathology images, and interpreted the pathomics-based model with multi-omics correlation. Methods: Two cohorts of patients were collected, including 429 from The Cancer Genome Atlas (TCGA-COAD) and 785 from an Asian colorectal cancer (CRC) cohort (Asian-CRC). We established the pathomics model, named Ensembled Patch Likelihood Aggregation (EPLA), based on two consecutive stages: patch-level prediction and WSI-level prediction. The initial model was developed and validated in TCGA-COAD, and then generalized in Asian-CRC through transfer learning. The pathological signatures extracted from the model were analyzed with genomic and transcriptomic profiles for model interpretation. Results: The EPLA model achieved an area-under-the-curve (AUC) of 0.8848 (95% CI: 0.8185-0.9512) in the TCGA-COAD test set and an AUC of 0.8504 (95% CI: 0.7591-0.9323) in the external validation set Asian-CRC after transfer learning. Notably, EPLA captured the relationship between pathological phenotype of poor differentiation and MSI (P < 0.001). Furthermore, the five pathological imaging signatures identified from the EPLA model were associated with mutation burden and DNA damage repair related genotype in the genomic profiles, and antitumor immunity activated pathway in the transcriptomic profiles. Conclusions: Our pathomics-based deep learning model can effectively predict MSI from histopathology images and is transferable to a new patient cohort. The interpretability of our model by association with pathological, genomic and transcriptomic phenotypes lays the foundation for prospective clinical trials of the application of this artificial intelligence (AI) platform in ICB therapy.Item Diagnostic Evidence GAuge of Single cells (DEGAS): a flexible deep transfer learning framework for prioritizing cells in relation to disease(BMC, 2022-02-01) Johnson, Travis S.; Yu, Christina Y.; Huang, Zhi; Xu, Siwen; Wang, Tongxin; Dong, Chuanpeng; Shao, Wei; Zaid, Mohammad Abu; Huang, Xiaoqing; Wang, Yijie; Bartlett, Christopher; Zhang, Yan; Walker, Brian A.; Liu, Yunlong; Huang, Kun; Zhang, Jie; Medicine, School of MedicineWe propose DEGAS (Diagnostic Evidence GAuge of Single cells), a novel deep transfer learning framework, to transfer disease information from patients to cells. We call such transferrable information "impressions," which allow individual cells to be associated with disease attributes like diagnosis, prognosis, and response to therapy. Using simulated data and ten diverse single-cell and patient bulk tissue transcriptomic datasets from glioblastoma multiforme (GBM), Alzheimer's disease (AD), and multiple myeloma (MM), we demonstrate the feasibility, flexibility, and broad applications of the DEGAS framework. DEGAS analysis on myeloma single-cell transcriptomics identified PHF19high myeloma cells associated with progression.Item Generalized gene co-expression analysis via subspace clustering using low-rank representation(Biomed Central, 2019-05-01) Wang, Tongxin; Zhang, Jie; Huang, Kun; Medical and Molecular Genetics, School of MedicineBACKGROUND: Gene Co-expression Network Analysis (GCNA) helps identify gene modules with potential biological functions and has become a popular method in bioinformatics and biomedical research. However, most current GCNA algorithms use correlation to build gene co-expression networks and identify modules with highly correlated genes. There is a need to look beyond correlation and identify gene modules using other similarity measures for finding novel biologically meaningful modules. RESULTS: We propose a new generalized gene co-expression analysis algorithm via subspace clustering that can identify biologically meaningful gene co-expression modules with genes that are not all highly correlated. We use low-rank representation to construct gene co-expression networks and local maximal quasi-clique merger to identify gene co-expression modules. We applied our method on three large microarray datasets and a single-cell RNA sequencing dataset. We demonstrate that our method can identify gene modules with different biological functions than current GCNA methods and find gene modules with prognostic values. CONCLUSIONS: The presented method takes advantage of subspace clustering to generate gene co-expression networks rather than using correlation as the similarity measure between genes. Our generalized GCNA method can provide new insights from gene expression datasets and serve as a complement to current GCNA algorithms.Item LAmbDA: label ambiguous domain adaptation dataset integration reduces batch effects and improves subtype detection(Oxford Academic, 2019-04) Johnson, Travis S.; Wang, Tongxin; Huang, Zhi; Yu, Christina Y.; Wu, Yi; Han, Yatong; Zhang, Yan; Huang, Kun; Zhang, Jie; Medicine, School of MedicineMotivation Rapid advances in single cell RNA sequencing (scRNA-seq) have produced higher-resolution cellular subtypes in multiple tissues and species. Methods are increasingly needed across datasets and species to (i) remove systematic biases, (ii) model multiple datasets with ambiguous labels and (iii) classify cells and map cell type labels. However, most methods only address one of these problems on broad cell types or simulated data using a single model type. It is also important to address higher-resolution cellular subtypes, subtype labels from multiple datasets, models trained on multiple datasets simultaneously and generalizability beyond a single model type. Results We developed a species- and dataset-independent transfer learning framework (LAmbDA) to train models on multiple datasets (even from different species) and applied our framework on simulated, pancreas and brain scRNA-seq experiments. These models mapped corresponding cell types between datasets with inconsistent cell subtype labels while simultaneously reducing batch effects. We achieved high accuracy in labeling cellular subtypes (weighted accuracy simulated 1 datasets: 90%; simulated 2 datasets: 94%; pancreas datasets: 88% and brain datasets: 66%) using LAmbDA Feedforward 1 Layer Neural Network with bagging. This method achieved higher weighted accuracy in labeling cellular subtypes than two other state-of-the-art methods, scmap and CaSTLe in brain (66% versus 60% and 32%). Furthermore, it achieved better performance in correctly predicting ambiguous cellular subtype labels across datasets in 88% of test cases compared with CaSTLe (63%), scmap (50%) and MetaNeighbor (50%). LAmbDA is model- and dataset-independent and generalizable to diverse data types representing an advance in biocomputing.Item Machine Learning Based Classification from Whole-Slide Histopathological Images Enables Reliable and Interpretable Diagnosis of Inverted Urothelial Papilloma(Elsevier, 2021-11-05) Shao, Wei; Cheng, Michael; Huang, Zhi; Han, Zhi; Wang, Tongxin; Lopez-Beltran, Antonio; Osunkoya, Adeboye O.; Zhang, Jie; Cheng, Liang; Huang, Kun; Medicine, School of MedicineInverted urothelial papilloma (IUP) is a benign neoplasm of the urinary tract that accounts for less than 1% of urothelial tumors. It is diagnostically challenging for pathologists to distinguish histological features of IUP from other subtypes of non-invasive urothelial carcinoma, such as inverted Ta urothelial carcinoma (UCInv) and low-grade Ta urothelial carcinoma (UCLG). Using a machine learning approach, we analyzed the H&E-stained whole-slide histopathological images of 64 IUP (the largest cohort to date), 69 UCInv, and 92 UCLG samples, and propose a reliable, reproducible, and interpretable machine learning pipeline to classify IUP from other non-invasive urothelial carcinomas. The results showed that our method could achieve area under the ROC of 0.913 and 0.920 for classifying IUP from UCInv and UCLG, respectively, which is superior to competing methods, including deep learning-based methods. Testing of the classification models on an external validation dataset confirmed that our model can effectively identify IUP with high accuracy. Our results suggest that the proposed machine learning pipeline can robustly and accurately capture histopathological differences between IUP and other urothelial carcinoma subtypes, which can be extended to identify other rare cancer subtypes with limited samples and has the potential to expand the clinician’s armamentarium for accurate diagnosis.Item MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification(Springer Nature, 2021-06-08) Wang, Tongxin; Shao, Wei; Huang, Zhi; Tang, Haixu; Zhang, Jie; Ding, Zhengming; Huang, Kun; Medicine, School of MedicineTo fully utilize the advances in omics technologies and achieve a more comprehensive understanding of human diseases, novel computational methods are required for integrative analysis of multiple types of omics data. Here, we present a novel multi-omics integrative method named Multi-Omics Graph cOnvolutional NETworks (MOGONET) for biomedical classification. MOGONET jointly explores omics-specific learning and cross-omics correlation learning for effective multi-omics data classification. We demonstrate that MOGONET outperforms other state-of-the-art supervised multi-omics integrative analysis approaches from different biomedical classification applications using mRNA expression data, DNA methylation data, and microRNA expression data. Furthermore, MOGONET can identify important biomarkers from different omics data types related to the investigated biomedical problems.Item SPCS: a spatial and pattern combined smoothing method for spatial transcriptomic expression(Oxford University Press, 2022) Liu, Yusong; Wang, Tongxin; Duggan, Ben; Sharpnack, Michael; Huang, Kun; Zhang, Jie; Ye, Xiufen; Johnson, Travis S.; Biostatistics and Health Data Science, School of MedicineHigh-dimensional, localized ribonucleic acid (RNA) sequencing is now possible owing to recent developments in spatial transcriptomics (ST). ST is based on highly multiplexed sequence analysis and uses barcodes to match the sequenced reads to their respective tissue locations. ST expression data suffer from high noise and dropout events; however, smoothing techniques have the promise to improve the data interpretability prior to performing downstream analyses. Single-cell RNA sequencing (scRNA-seq) data similarly suffer from these limitations, and smoothing methods developed for scRNA-seq can only utilize associations in transcriptome space (also known as one-factor smoothing methods). Since they do not account for spatial relationships, these one-factor smoothing methods cannot take full advantage of ST data. In this study, we present a novel two-factor smoothing technique, spatial and pattern combined smoothing (SPCS), that employs the k-nearest neighbor (kNN) technique to utilize information from transcriptome and spatial relationships. By performing SPCS on multiple ST slides from pancreatic ductal adenocarcinoma (PDAC), dorsolateral prefrontal cortex (DLPFC) and simulated high-grade serous ovarian cancer (HGSOC) datasets, smoothed ST slides have better separability, partition accuracy and biological interpretability than the ones smoothed by preexisting one-factor methods. Source code of SPCS is provided in Github (https://github.com/Usos/SPCS).Item Topological Methods for Visualization and Analysis of High Dimensional Single-Cell RNA Sequencing Data(World Scientific Publishing Company, 2019) Wang, Tongxin; Johnson, Travis; Zhang, Jie; Huang, Kun; Department of Medical and Molecular Genetics, Indiana University School of MedicineSingle-cell RNA sequencing (scRNA-seq) techniques have been very powerful in analyzing heterogeneous cell population and identifying cell types. Visualizing scRNA-seq data can help researchers effectively extract meaningful biological information and make new discoveries. While commonly used scRNA-seq visualization methods, such as t-SNE, are useful in detecting cell clusters, they often tear apart the intrinsic continuous structure in gene expression profiles. Topological Data Analysis (TDA) approaches like Mapper capture the shape of data by representing data as topological networks. TDA approaches are robust to noise and different platforms, while preserving the locality and data continuity. Moreover, instead of analyzing the whole dataset, Mapper allows researchers to explore biological meanings of specific pathways and genes by using different filter functions. In this paper, we applied Mapper to visualize scRNA-seq data. Our method can not only capture the clustering structure of cells, but also preserve the continuous gene expression topologies of cells. We demonstrated that by combining with gene co-expression network analysis, our method can reveal differential expression patterns of gene co-expression modules along the Mapper visualization.