- Browse by Author
Browsing by Author "Chang, Wennan"
Now showing 1 - 10 of 20
Results Per Page
Sort Options
Item A graph neural network model to estimate cell-wise metabolic flux using single-cell RNA-seq data(Cold Spring Harbor Laboratory, 2021) Alghamdi, Norah; Chang, Wennan; Dang, Pengtao; Lu, Xiaoyu; Wan, Changlin; Gampala, Silpa; Huang, Zhi; Wang, Jiashi; Ma, Qin; Zang, Yong; Fishel, Melissa; Cao, Sha; Zhang, Chi; Medical and Molecular Genetics, School of MedicineThe metabolic heterogeneity and metabolic interplay between cells are known as significant contributors to disease treatment resistance. However, with the lack of a mature high-throughput single-cell metabolomics technology, we are yet to establish systematic understanding of the intra-tissue metabolic heterogeneity and cooperative mechanisms. To mitigate this knowledge gap, we developed a novel computational method, namely, single-cell flux estimation analysis (scFEA), to infer the cell-wise fluxome from single-cell RNA-sequencing (scRNA-seq) data. scFEA is empowered by a systematically reconstructed human metabolic map as a factor graph, a novel probabilistic model to leverage the flux balance constraints on scRNA-seq data, and a novel graph neural network-based optimization solver. The intricate information cascade from transcriptome to metabolome was captured using multilayer neural networks to capitulate the nonlinear dependency between enzymatic gene expressions and reaction rates. We experimentally validated scFEA by generating an scRNA-seq data set with matched metabolomics data on cells of perturbed oxygen and genetic conditions. Application of scFEA on this data set showed the consistency between predicted flux and the observed variation of metabolite abundance in the matched metabolomics data. We also applied scFEA on five publicly available scRNA-seq and spatial transcriptomics data sets and identified context- and cell group-specific metabolic variations. The cell-wise fluxome predicted by scFEA empowers a series of downstream analyses including identification of metabolic modules or cell groups that share common metabolic variations, sensitivity evaluation of enzymes with regards to their impact on the whole metabolic flux, and inference of cell-tissue and cell-cell metabolic communications.Item Acid–base Homeostasis and Implications to the Phenotypic Behaviors of Cancer(Elsevier, 2023) Zhou, Yi; Chang, Wennan; Lu, Xiaoyu; Wang, Jin; Zhang, Chi; Xu, Ying; Medical and Molecular Genetics, School of MedicineAcid-base homeostasis is a fundamental property of living cells, and its persistent disruption in human cells can lead to a wide range of diseases. In this study, we conducted a computational modeling analysis of transcriptomic data of 4750 human tissue samples of 9 cancer types in The Cancer Genome Atlas (TCGA) database. Built on our previous study, we quantitatively estimated the average production rate of OH- by cytosolic Fenton reactions, which continuously disrupt the intracellular pH (pHi) homeostasis. Our predictions indicate that all or at least a subset of 43 reprogrammed metabolisms (RMs) are induced to produce net protons (H+) at comparable rates of Fenton reactions to keep the pHi stable. We then discovered that a number of well-known phenotypes of cancers, including increased growth rate, metastasis rate, and local immune cell composition, can be naturally explained in terms of the Fenton reaction level and the induced RMs. This study strongly suggests the possibility to have a unified framework for studies of cancer-inducing stressors, adaptive metabolic reprogramming, and cancerous behaviors. In addition, strong evidence is provided to demonstrate that a popular view that Na+/H+ exchangers along with lactic acid exporters and carbonic anhydrases are responsible for the intracellular alkalization and extracellular acidification in cancer may not be justified.Item A data denoising approach to optimize functional clustering of single cell RNA-sequencing data(IEEE, 2020-12) Wan, Changlin; Jia, Dongya; Zhao, Yue; Chang, Wennan; Cao, Sha; Wang, Xiao; Zhang, Chi; Medical and Molecular Genetics, School of MedicineSingle cell RNA-sequencing (scRNA-seq) technology enables comprehensive transcriptomic profiling of thousands of cells with distinct phenotypic and physiological states in a complex tissue. Substantial efforts have been made to characterize single cells of distinct identities from scRNA-seq data, including various cell clustering techniques. While existing approaches can handle single cells in terms of different cell (sub)types at a high resolution, identification of the functional variability within the same cell type remains unsolved. In addition, there is a lack of robust method to handle the inter-subject variation that often brings severe confounding effects for the functional clustering of single cells. In this study, we developed a novel data denoising and cell clustering approach, namely CIBS, to provide biologically explainable functional classification for scRNA-seq data. CIBS is based on a systems biology model of transcriptional regulation that assumes a multi-modality distribution of the cells' activation status, and it utilizes a Boolean matrix factorization approach on the discretized expression status to robustly derive functional modules. CIBS is empowered by a novel fast Boolean Matrix Factorization method, namely PFAST, to increase the computational feasibility on large scale scRNA-seq data. Application of CIBS on two scRNA-seq datasets collected from cancer tumor micro-environment successfully identified subgroups of cancer cells with distinct expression patterns of epithelial-mesenchymal transition and extracellular matrix marker genes, which was not revealed by the existing cell clustering analysis tools. The identified cell groups were significantly associated with the clinically confirmed lymph-node invasion and metastasis events across different patients.Item Denoising Individual Bias for Fairer Binary Submatrix Detection(ACM, 2020-10) Wan, Changlin; Chang, Wennan; Zhao, Tong; Cao, Sha; Zhang, Chi; Biostatistics, School of Public HealthLow rank representation of binary matrix is powerful in disentangling sparse individual-attribute associations, and has received wide applications. Existing binary matrix factorization (BMF) or co-clustering (CC) methods often assume i.i.d background noise. However, this assumption could be easily violated in real data, where heterogeneous row- or column-wise probability of binary entries results in disparate element-wise background distribution, and paralyzes the rationality of existing methods. We propose a binary data denoising framework, namely BIND, which optimizes the detection of true patterns by estimating the row- or column-wise mixture distribution of patterns and disparate background, and eliminating the binary attributes that are more likely from the background. BIND is supported by thoroughly derived mathematical property of the row- and column-wise mixture distributions. Our experiment on synthetic and real-world data demonstrated BIND effectively removes background noise and drastically increases the fairness and accuracy of state-of-the arts BMF and CC methods.Item Fast and Efficient Boolean Matrix Factorization by Geometric Segmentation(AAAI, 2020-06) Wan, Changlin; Chang, Wennan; Zhao, Tong; Li, Mengya; Cao, Sha; Zhang, Chi; Medical and Molecular Genetics, School of MedicineBoolean matrix has been used to represent digital information in many fields, including bank transaction, crime records, natural language processing, protein-protein interaction, etc. Boolean matrix factorization (BMF) aims to find an approximation of a binary matrix as the Boolean product of two low rank Boolean matrices, which could generate vast amount of information for the patterns of relationships between the features and samples. Inspired by binary matrix permutation theories and geometric segmentation, we developed a fast and efficient BMF approach, called MEBF (Median Expansion for Boolean Factorization). Overall, MEBF adopted a heuristic approach to locate binary patterns presented as submatrices that are dense in 1's. At each iteration, MEBF permutates the rows and columns such that the permutated matrix is approximately Upper Triangular-Like (UTL) with so-called Simultaneous Consecutive-ones Property (SC1P). The largest submatrix dense in 1 would lie on the upper triangular area of the permutated matrix, and its location was determined based on a geometric segmentation of a triangular. We compared MEBF with other state of the art approaches on data scenarios with different density and noise levels. MEBF demonstrated superior performances in lower reconstruction error, and higher computational efficiency, as well as more accurate density patterns than popular methods such as ASSO, PANDA and Message Passing. We demonstrated the application of MEBF on both binary and non-binary data sets, and revealed its further potential in knowledge retrieving and data denoising.Item FLUXestimator: a webserver for predicting metabolic flux and variations using transcriptomics data(Oxford University Press, 2023) Zhang, Zixuan; Zhu, Haiqi; Dang, Pengtao; Wang, Jia; Chang, Wennan; Wang, Xiao; Alghamdi, Norah; Lu, Alex; Zang, Yong; Wu, Wenzhuo; Wang, Yijie; Zhang, Yu; Cao, Sha; Zhang, Chi; Medical and Molecular Genetics, School of MedicineQuantitative assessment of single cell fluxome is critical for understanding the metabolic heterogeneity in diseases. Unfortunately, laboratory-based single cell fluxomics is currently impractical, and the current computational tools for flux estimation are not designed for single cell-level prediction. Given the well-established link between transcriptomic and metabolomic profiles, leveraging single cell transcriptomics data to predict single cell fluxome is not only feasible but also an urgent task. In this study, we present FLUXestimator, an online platform for predicting metabolic fluxome and variations using single cell or general transcriptomics data of large sample-size. The FLUXestimator webserver implements a recently developed unsupervised approach called single cell flux estimation analysis (scFEA), which uses a new neural network architecture to estimate reaction rates from transcriptomics data. To the best of our knowledge, FLUXestimator is the first web-based tool dedicated to predicting cell-/sample-wise metabolic flux and metabolite variations using transcriptomics data of human, mouse and 15 other common experimental organisms. The FLUXestimator webserver is available at http://scFLUX.org/, and stand-alone tools for local use are available at https://github.com/changwn/scFEA. Our tool provides a new avenue for studying metabolic heterogeneity in diseases and has the potential to facilitate the development of new therapeutic strategies.Item Geometric All-way Boolean Tensor Decomposition(2020) Wan, Changlin; Chang, Wennan; Zhao, Tong; Cao, Sha; Zhang, Chi; Biostatistics, School of Public HealthBoolean tensor has been broadly utilized in representing high dimensional logical data collected on spatial, temporal and/or other relational domains. Boolean Tensor Decomposition (BTD) factorizes a binary tensor into the Boolean sum of multiple rank-1 tensors, which is an NP-hard problem. Existing BTD methods have been limited by their high computational cost, in applications to large scale or higher order tensors. In this work, we presented a computationally efficient BTD algorithm, namely Geometric Expansion for all-order Tensor Factorization (GETF), that sequentially identifies the rank-1 basis components for a tensor from a geometric perspective. We conducted rigorous theoretical analysis on the validity as well as algorithemic efficiency of GETF in decomposing all-order tensor. Experiments on both synthetic and real-world data demonstrated that GETF has significantly improved performance in reconstruction accuracy, extraction of latent structures and it is an order of magnitude faster than other state-of-the-art methods.Item ICTD: A semi-supervised cell type identification and deconvolution method for multi-omics data(BioRxiv, 2019) Chang, Wennan; Wan, Changlin; Lu, Xiaoyu; Tu, Szu-wei; Sun, Yifan; Zhang, Xinna; Zang, Yong; Zhang, Anru; Huang, Kun; Liu, Yunlong; Lu, Xiongbin; Cao, Sha; Zhang, Chi; Medical and Molecular Genetics, School of MedicineWe developed a novel deconvolution method, namely Inference of Cell Types and Deconvolution (ICTD) that addresses the fundamental issue of identifiability and robustness in current tissue data deconvolution problem. ICTD provides substantially new capabilities for omics data based characterization of a tissue microenvironment, including (1) maximizing the resolution in identifying resident cell and sub types that truly exists in a tissue, (2) identifying the most reliable marker genes for each cell type, which are tissue and data set specific, (3) handling the stability problem with co-linear cell types, (4) co-deconvoluting with available matched multi-omics data, and (5) inferring functional variations specific to one or several cell types. ICTD is empowered by (i) rigorously derived mathematical conditions of identifiable cell type and cell type specific functions in tissue transcriptomics data and (ii) a semi supervised approach to maximize the knowledge transfer of cell type and functional marker genes identified in single cell or bulk cell data in the analysis of tissue data, and (iii) a novel unsupervised approach to minimize the bias brought by training data. Application of ICTD on real and single cell simulated tissue data validated that the method has consistently good performance for tissue data coming from different species, tissue microenvironments, and experimental platforms. Other than the new capabilities, ICTD outperformed other state-of-the-art devolution methods on prediction accuracy, the resolution of identifiable cell, detection of unknown sub cell types, and assessment of cell type specific functions. The premise of ICTD also lies in characterizing cell-cell interactions and discovering cell types and prognostic markers that are predictive of clinical outcomes.Item LTMG: a novel statistical modeling of transcriptional expression states in single-cell RNA-Seq data(Oxford University Press, 2019-10-10) Wan, Changlin; Chang, Wennan; Zhang, Yu; Shah, Fenil; Lu, Xiaoyu; Zang, Yong; Zhang, Anru; Cao, Sha; Fishel, Melissa L.; Ma, Qin; Zhang, Chi; Medical and Molecular Genetics, School of MedicineA key challenge in modeling single-cell RNA-seq data is to capture the diversity of gene expression states regulated by different transcriptional regulatory inputs across individual cells, which is further complicated by largely observed zero and low expressions. We developed a left truncated mixture Gaussian (LTMG) model, from the kinetic relationships of the transcriptional regulatory inputs, mRNA metabolism and abundance in single cells. LTMG infers the expression multi-modalities across single cells, meanwhile, the dropouts and low expressions are treated as left truncated. We demonstrated that LTMG has significantly better goodness of fitting on an extensive number of scRNA-seq data, comparing to three other state-of-the-art models. Our biological assumption of the low non-zero expressions, rationality of the multimodality setting, and the capability of LTMG in extracting expression states specific to cell types or functions, are validated on independent experimental data sets. A differential gene expression test and a co-regulation module identification method are further developed. We experimentally validated that our differential expression test has higher sensitivity and specificity, compared with other five popular methods. The co-regulation analysis is capable of retrieving gene co-regulation modules corresponding to perturbed transcriptional regulations. A user-friendly R package with all the analysis power is available at https://github.com/zy26/LTMGSCA.Item M3S: a comprehensive model selection for multi-modal single-cell RNA sequencing data(BMC, 2019-12-20) Zhang, Yu; Wan, Changlin; Wang, Pengcheng; Chang, Wennan; Huo, Yan; Chen, Jian; Ma, Qin; Cao, Sha; Zhang, Chi; Medical and Molecular Genetics, School of MedicineBackground Various statistical models have been developed to model the single cell RNA-seq expression profiles, capture its multimodality, and conduct differential gene expression test. However, for expression data generated by different experimental design and platforms, there is currently lack of capability to determine the most proper statistical model. Results We developed an R package, namely Multi-Modal Model Selection (M3S), for gene-wise selection of the most proper multi-modality statistical model and downstream analysis, useful in a single-cell or large scale bulk tissue transcriptomic data. M3S is featured with (1) gene-wise selection of the most parsimonious model among 11 most commonly utilized ones, that can best fit the expression distribution of the gene, (2) parameter estimation of a selected model, and (3) differential gene expression test based on the selected model. Conclusion A comprehensive evaluation suggested that M3S can accurately capture the multimodality on simulated and real single cell data. An open source package and is available through GitHub at https://github.com/zy26/M3S.