- Browse by Author
Browsing by Author "Chang, Wennan"
Now showing 1 - 10 of 17
Results Per Page
Sort Options
Item A data denoising approach to optimize functional clustering of single cell RNA-sequencing data(IEEE, 2020-12) Wan, Changlin; Jia, Dongya; Zhao, Yue; Chang, Wennan; Cao, Sha; Wang, Xiao; Zhang, Chi; Medical and Molecular Genetics, School of MedicineSingle cell RNA-sequencing (scRNA-seq) technology enables comprehensive transcriptomic profiling of thousands of cells with distinct phenotypic and physiological states in a complex tissue. Substantial efforts have been made to characterize single cells of distinct identities from scRNA-seq data, including various cell clustering techniques. While existing approaches can handle single cells in terms of different cell (sub)types at a high resolution, identification of the functional variability within the same cell type remains unsolved. In addition, there is a lack of robust method to handle the inter-subject variation that often brings severe confounding effects for the functional clustering of single cells. In this study, we developed a novel data denoising and cell clustering approach, namely CIBS, to provide biologically explainable functional classification for scRNA-seq data. CIBS is based on a systems biology model of transcriptional regulation that assumes a multi-modality distribution of the cells' activation status, and it utilizes a Boolean matrix factorization approach on the discretized expression status to robustly derive functional modules. CIBS is empowered by a novel fast Boolean Matrix Factorization method, namely PFAST, to increase the computational feasibility on large scale scRNA-seq data. Application of CIBS on two scRNA-seq datasets collected from cancer tumor micro-environment successfully identified subgroups of cancer cells with distinct expression patterns of epithelial-mesenchymal transition and extracellular matrix marker genes, which was not revealed by the existing cell clustering analysis tools. The identified cell groups were significantly associated with the clinically confirmed lymph-node invasion and metastasis events across different patients.Item Denoising Individual Bias for Fairer Binary Submatrix Detection(ACM, 2020-10) Wan, Changlin; Chang, Wennan; Zhao, Tong; Cao, Sha; Zhang, Chi; Biostatistics, School of Public HealthLow rank representation of binary matrix is powerful in disentangling sparse individual-attribute associations, and has received wide applications. Existing binary matrix factorization (BMF) or co-clustering (CC) methods often assume i.i.d background noise. However, this assumption could be easily violated in real data, where heterogeneous row- or column-wise probability of binary entries results in disparate element-wise background distribution, and paralyzes the rationality of existing methods. We propose a binary data denoising framework, namely BIND, which optimizes the detection of true patterns by estimating the row- or column-wise mixture distribution of patterns and disparate background, and eliminating the binary attributes that are more likely from the background. BIND is supported by thoroughly derived mathematical property of the row- and column-wise mixture distributions. Our experiment on synthetic and real-world data demonstrated BIND effectively removes background noise and drastically increases the fairness and accuracy of state-of-the arts BMF and CC methods.Item Fast and Efficient Boolean Matrix Factorization by Geometric Segmentation(AAAI, 2020-06) Wan, Changlin; Chang, Wennan; Zhao, Tong; Li, Mengya; Cao, Sha; Zhang, Chi; Medical and Molecular Genetics, School of MedicineBoolean matrix has been used to represent digital information in many fields, including bank transaction, crime records, natural language processing, protein-protein interaction, etc. Boolean matrix factorization (BMF) aims to find an approximation of a binary matrix as the Boolean product of two low rank Boolean matrices, which could generate vast amount of information for the patterns of relationships between the features and samples. Inspired by binary matrix permutation theories and geometric segmentation, we developed a fast and efficient BMF approach, called MEBF (Median Expansion for Boolean Factorization). Overall, MEBF adopted a heuristic approach to locate binary patterns presented as submatrices that are dense in 1's. At each iteration, MEBF permutates the rows and columns such that the permutated matrix is approximately Upper Triangular-Like (UTL) with so-called Simultaneous Consecutive-ones Property (SC1P). The largest submatrix dense in 1 would lie on the upper triangular area of the permutated matrix, and its location was determined based on a geometric segmentation of a triangular. We compared MEBF with other state of the art approaches on data scenarios with different density and noise levels. MEBF demonstrated superior performances in lower reconstruction error, and higher computational efficiency, as well as more accurate density patterns than popular methods such as ASSO, PANDA and Message Passing. We demonstrated the application of MEBF on both binary and non-binary data sets, and revealed its further potential in knowledge retrieving and data denoising.Item FLUXestimator: a webserver for predicting metabolic flux and variations using transcriptomics data(Oxford University Press, 2023) Zhang, Zixuan; Zhu, Haiqi; Dang, Pengtao; Wang, Jia; Chang, Wennan; Wang, Xiao; Alghamdi, Norah; Lu, Alex; Zang, Yong; Wu, Wenzhuo; Wang, Yijie; Zhang, Yu; Cao, Sha; Zhang, Chi; Medical and Molecular Genetics, School of MedicineQuantitative assessment of single cell fluxome is critical for understanding the metabolic heterogeneity in diseases. Unfortunately, laboratory-based single cell fluxomics is currently impractical, and the current computational tools for flux estimation are not designed for single cell-level prediction. Given the well-established link between transcriptomic and metabolomic profiles, leveraging single cell transcriptomics data to predict single cell fluxome is not only feasible but also an urgent task. In this study, we present FLUXestimator, an online platform for predicting metabolic fluxome and variations using single cell or general transcriptomics data of large sample-size. The FLUXestimator webserver implements a recently developed unsupervised approach called single cell flux estimation analysis (scFEA), which uses a new neural network architecture to estimate reaction rates from transcriptomics data. To the best of our knowledge, FLUXestimator is the first web-based tool dedicated to predicting cell-/sample-wise metabolic flux and metabolite variations using transcriptomics data of human, mouse and 15 other common experimental organisms. The FLUXestimator webserver is available at http://scFLUX.org/, and stand-alone tools for local use are available at https://github.com/changwn/scFEA. Our tool provides a new avenue for studying metabolic heterogeneity in diseases and has the potential to facilitate the development of new therapeutic strategies.Item Geometric All-way Boolean Tensor Decomposition(2020) Wan, Changlin; Chang, Wennan; Zhao, Tong; Cao, Sha; Zhang, Chi; Biostatistics, School of Public HealthBoolean tensor has been broadly utilized in representing high dimensional logical data collected on spatial, temporal and/or other relational domains. Boolean Tensor Decomposition (BTD) factorizes a binary tensor into the Boolean sum of multiple rank-1 tensors, which is an NP-hard problem. Existing BTD methods have been limited by their high computational cost, in applications to large scale or higher order tensors. In this work, we presented a computationally efficient BTD algorithm, namely Geometric Expansion for all-order Tensor Factorization (GETF), that sequentially identifies the rank-1 basis components for a tensor from a geometric perspective. We conducted rigorous theoretical analysis on the validity as well as algorithemic efficiency of GETF in decomposing all-order tensor. Experiments on both synthetic and real-world data demonstrated that GETF has significantly improved performance in reconstruction accuracy, extraction of latent structures and it is an order of magnitude faster than other state-of-the-art methods.Item ICTD: A semi-supervised cell type identification and deconvolution method for multi-omics data(BioRxiv, 2019) Chang, Wennan; Wan, Changlin; Lu, Xiaoyu; Tu, Szu-wei; Sun, Yifan; Zhang, Xinna; Zang, Yong; Zhang, Anru; Huang, Kun; Liu, Yunlong; Lu, Xiongbin; Cao, Sha; Zhang, Chi; Medical and Molecular Genetics, School of MedicineWe developed a novel deconvolution method, namely Inference of Cell Types and Deconvolution (ICTD) that addresses the fundamental issue of identifiability and robustness in current tissue data deconvolution problem. ICTD provides substantially new capabilities for omics data based characterization of a tissue microenvironment, including (1) maximizing the resolution in identifying resident cell and sub types that truly exists in a tissue, (2) identifying the most reliable marker genes for each cell type, which are tissue and data set specific, (3) handling the stability problem with co-linear cell types, (4) co-deconvoluting with available matched multi-omics data, and (5) inferring functional variations specific to one or several cell types. ICTD is empowered by (i) rigorously derived mathematical conditions of identifiable cell type and cell type specific functions in tissue transcriptomics data and (ii) a semi supervised approach to maximize the knowledge transfer of cell type and functional marker genes identified in single cell or bulk cell data in the analysis of tissue data, and (iii) a novel unsupervised approach to minimize the bias brought by training data. Application of ICTD on real and single cell simulated tissue data validated that the method has consistently good performance for tissue data coming from different species, tissue microenvironments, and experimental platforms. Other than the new capabilities, ICTD outperformed other state-of-the-art devolution methods on prediction accuracy, the resolution of identifiable cell, detection of unknown sub cell types, and assessment of cell type specific functions. The premise of ICTD also lies in characterizing cell-cell interactions and discovering cell types and prognostic markers that are predictive of clinical outcomes.Item LTMG: a novel statistical modeling of transcriptional expression states in single-cell RNA-Seq data(Oxford University Press, 2019-10-10) Wan, Changlin; Chang, Wennan; Zhang, Yu; Shah, Fenil; Lu, Xiaoyu; Zang, Yong; Zhang, Anru; Cao, Sha; Fishel, Melissa L.; Ma, Qin; Zhang, Chi; Medical and Molecular Genetics, School of MedicineA key challenge in modeling single-cell RNA-seq data is to capture the diversity of gene expression states regulated by different transcriptional regulatory inputs across individual cells, which is further complicated by largely observed zero and low expressions. We developed a left truncated mixture Gaussian (LTMG) model, from the kinetic relationships of the transcriptional regulatory inputs, mRNA metabolism and abundance in single cells. LTMG infers the expression multi-modalities across single cells, meanwhile, the dropouts and low expressions are treated as left truncated. We demonstrated that LTMG has significantly better goodness of fitting on an extensive number of scRNA-seq data, comparing to three other state-of-the-art models. Our biological assumption of the low non-zero expressions, rationality of the multimodality setting, and the capability of LTMG in extracting expression states specific to cell types or functions, are validated on independent experimental data sets. A differential gene expression test and a co-regulation module identification method are further developed. We experimentally validated that our differential expression test has higher sensitivity and specificity, compared with other five popular methods. The co-regulation analysis is capable of retrieving gene co-regulation modules corresponding to perturbed transcriptional regulations. A user-friendly R package with all the analysis power is available at https://github.com/zy26/LTMGSCA.Item M3S: a comprehensive model selection for multi-modal single-cell RNA sequencing data(BMC, 2019-12-20) Zhang, Yu; Wan, Changlin; Wang, Pengcheng; Chang, Wennan; Huo, Yan; Chen, Jian; Ma, Qin; Cao, Sha; Zhang, Chi; Medical and Molecular Genetics, School of MedicineBackground Various statistical models have been developed to model the single cell RNA-seq expression profiles, capture its multimodality, and conduct differential gene expression test. However, for expression data generated by different experimental design and platforms, there is currently lack of capability to determine the most proper statistical model. Results We developed an R package, namely Multi-Modal Model Selection (M3S), for gene-wise selection of the most proper multi-modality statistical model and downstream analysis, useful in a single-cell or large scale bulk tissue transcriptomic data. M3S is featured with (1) gene-wise selection of the most parsimonious model among 11 most commonly utilized ones, that can best fit the expression distribution of the gene, (2) parameter estimation of a selected model, and (3) differential gene expression test based on the selected model. Conclusion A comprehensive evaluation suggested that M3S can accurately capture the multimodality on simulated and real single cell data. An open source package and is available through GitHub at https://github.com/zy26/M3S.Item Modulation of Immune Infiltration of Ovarian Cancer Tumor Microenvironment by Specific Subpopulations of Fibroblasts(MDPI, 2020-10-29) Wang, Ji; Cheng, Frank H. C.; Tedrow, Jessica; Chang, Wennan; Zhang, Chi; Mitra, Anirban K.; Medical and Molecular Genetics, School of MedicineSimple Summary The ovarian cancer tumor microenvironment is made up of ovarian cancer cells along with a milieu of proteins and normal cells, including fibroblasts, immune cells, endothelial cells, pericytes and adipocytes. The noncancer components also play an important role in determining the fate of the tumor and exhibit a lot of heterogeneity. In this study, we have used a deconvolution algorithm to identify four different fibroblast subpopulations and multiple immune cell types, from bulk RNA-seq data of ovarian cancer primary tumors, metastases and normal omentum. We report the prevalence of specific fibroblast subtypes that determine the tumor-immune microenvironment. Our study can potentially help provide a template for identification of potential combination therapies to enhance the efficacy of ovarian cancer immunotherapies. Abstract Tumor immune infiltration plays a key role in the progression of solid tumors, including ovarian cancer, and immunotherapies are rapidly emerging as effective treatment modalities. However, the role of cancer-associated fibroblasts (CAFs), a predominant stromal constituent, in determining the tumor-immune microenvironment and modulating efficacy of immunotherapies remains poorly understood. We have conducted an extensive bioinformatic analysis of our and other publicly available ovarian cancer datasets (GSE137237, GSE132289 and GSE71340), to determine the correlation of fibroblast subtypes within the tumor microenvironment (TME) with the characteristics of tumor-immune infiltration. We identified (1) four functional modules of CAFs in ovarian cancer that are associated with the TME and metastasis of ovarian cancer, (2) immune-suppressive function of the collagen 1,3,5-expressing CAFs in primary ovarian cancer and omental metastases, and (3) consistent positive correlations between the functional modules of CAFs with anti-immune response genes and negative correlation with pro-immune response genes. Our study identifies a specific fibroblast subtype, fibroblast functional module (FFM)2, in the ovarian cancer tumor microenvironment that can potentially modulate a tumor-promoting immune microenvironment, which may be detrimental toward the effectiveness of ovarian cancer immunotherapies.Item Physioxia-induced downregulation of Tet2 in hematopoietic stem cells contributes to enhanced self-renewal(American Society of Hematology, 2022) Aljoufi, Arafat; Zhang, Chi; Ropa, James; Chang, Wennan; Palam, Lakshmi Reddy; Cooper, Scott; Ramdas, Baskar; Capitano, Maegan L.; Broxmeyer, Hal E.; Kapur, Reuben; Microbiology and Immunology, School of MedicineHematopoietic stem cells (HSCs) manifest impaired recovery and self-renewal with a concomitant increase in differentiation when exposed to ambient air as opposed to physioxia. Mechanism(s) behind this distinction are poorly understood but have the potential to improve stem cell transplantation. Single-cell RNA sequencing of HSCs in physioxia revealed upregulation of HSC self-renewal genes and downregulation of genes involved in inflammatory pathways and HSC differentiation. HSCs under physioxia also exhibited downregulation of the epigenetic modifier Tet2. Tet2 is α-ketoglutarate, iron- and oxygen-dependent dioxygenase that converts 5-methylcytosine to 5-hydroxymethylcytosine, thereby promoting active transcription. We evaluated whether loss of Tet2 affects the number and function of HSCs and hematopoietic progenitor cells (HPCs) under physioxia and ambient air. In contrast to wild-type HSCs (WT HSCs), a complete nonresponsiveness of Tet2-/- HSCs and HPCs to changes in oxygen tension was observed. Unlike WT HSCs, Tet2-/- HSCs and HPCs exhibited similar numbers and function in either physioxia or ambient air. The lack of response to changes in oxygen tension in Tet2-/- HSCs was associated with similar changes in self-renewal and quiescence genes among WT HSC-physioxia, Tet2-/- HSC-physioxia and Tet2-/- HSC-air. We define a novel molecular program involving Tet2 in regulating HSCs under physioxia.