- Browse by Author
Browsing by Author "Wan, Changlin"
Now showing 1 - 10 of 18
Results Per Page
Sort Options
Item A graph neural network model to estimate cell-wise metabolic flux using single-cell RNA-seq data(Cold Spring Harbor Laboratory, 2021) Alghamdi, Norah; Chang, Wennan; Dang, Pengtao; Lu, Xiaoyu; Wan, Changlin; Gampala, Silpa; Huang, Zhi; Wang, Jiashi; Ma, Qin; Zang, Yong; Fishel, Melissa; Cao, Sha; Zhang, Chi; Medical and Molecular Genetics, School of MedicineThe metabolic heterogeneity and metabolic interplay between cells are known as significant contributors to disease treatment resistance. However, with the lack of a mature high-throughput single-cell metabolomics technology, we are yet to establish systematic understanding of the intra-tissue metabolic heterogeneity and cooperative mechanisms. To mitigate this knowledge gap, we developed a novel computational method, namely, single-cell flux estimation analysis (scFEA), to infer the cell-wise fluxome from single-cell RNA-sequencing (scRNA-seq) data. scFEA is empowered by a systematically reconstructed human metabolic map as a factor graph, a novel probabilistic model to leverage the flux balance constraints on scRNA-seq data, and a novel graph neural network-based optimization solver. The intricate information cascade from transcriptome to metabolome was captured using multilayer neural networks to capitulate the nonlinear dependency between enzymatic gene expressions and reaction rates. We experimentally validated scFEA by generating an scRNA-seq data set with matched metabolomics data on cells of perturbed oxygen and genetic conditions. Application of scFEA on this data set showed the consistency between predicted flux and the observed variation of metabolite abundance in the matched metabolomics data. We also applied scFEA on five publicly available scRNA-seq and spatial transcriptomics data sets and identified context- and cell group-specific metabolic variations. The cell-wise fluxome predicted by scFEA empowers a series of downstream analyses including identification of metabolic modules or cell groups that share common metabolic variations, sensitivity evaluation of enzymes with regards to their impact on the whole metabolic flux, and inference of cell-tissue and cell-cell metabolic communications.Item Atractylenolide I enhances responsiveness to immune checkpoint blockade therapy by activating tumor antigen presentation(The American Society for Clinical Investigation, 2021-05-17) Xu, Hanchen; Van der Jeught, Kevin; Zhou, Zhuolong; Zhang, Lu; Yu, Tao; Sun, Yifan; Li, Yujing; Wan, Changlin; So, Ka Man; Liu, Degang; Frieden, Michael; Fang, Yuanzhang; Mosley, Amber L.; He, Xiaoming; Zhang, Xinna; Sandusky, George E.; Liu, Yunlong; Meroueh, Samy O.; Zhang, Chi; Wijeratne, Aruna B.; Huang, Cheng; Ji, Guang; Lu, Xiongbin; Medical and Molecular Genetics, School of MedicineOne of the primary mechanisms of tumor cell immune evasion is the loss of antigenicity, which arises due to lack of immunogenic tumor antigens as well as dysregulation of the antigen processing machinery. In a screen for small-molecule compounds from herbal medicine that potentiate T cell–mediated cytotoxicity, we identified atractylenolide I (ATT-I), which substantially promotes tumor antigen presentation of both human and mouse colorectal cancer (CRC) cells and thereby enhances the cytotoxic response of CD8+ T cells. Cellular thermal shift assay (CETSA) with multiplexed quantitative mass spectrometry identified the proteasome 26S subunit non–ATPase 4 (PSMD4), an essential component of the immunoproteasome complex, as a primary target protein of ATT-I. Binding of ATT-I with PSMD4 augments the antigen-processing activity of immunoproteasome, leading to enhanced MHC-I–mediated antigen presentation on cancer cells. In syngeneic mouse CRC models and human patient–derived CRC organoid models, ATT-I treatment promotes the cytotoxicity of CD8+ T cells and thus profoundly enhances the efficacy of immune checkpoint blockade therapy. Collectively, we show here that targeting the function of immunoproteasome with ATT-I promotes tumor antigen presentation and empowers T cell cytotoxicity, thus elevating the tumor response to immunotherapy.Item Bias Aware Probabilistic Boolean Matrix Factorization(PMLR, 2022-08) Wan, Changlin; Dang, Pengtao; Zhao, Tong; Zang, Yong; Zhang, Chi; Cao, Sha; Biostatistics, School of Public HealthBoolean matrix factorization (BMF) is a combinatorial problem arising from a wide range of applications including recommendation system, collaborative filtering, and dimensionality reduction. Currently, the noise model of existing BMF methods is often assumed to be homoscedastic; however, in real world data scenarios, the deviations of observed data from their true values are almost surely diverse due to stochastic noises, making each data point not equally suitable for fitting a model. In this case, it is not ideal to treat all data points as equally distributed. Motivated by such observations, we introduce a probabilistic BMF model that recognizes the object- and feature-wise bias distribution respectively, called bias aware BMF (BABF). To the best of our knowledge, BABF is the first approach for Boolean decomposition with consideration of the feature-wise and object-wise bias in binary data. We conducted experiments on datasets with different levels of background noise, bias level, and sizes of the signal patterns, to test the effectiveness of our method in various scenarios. We demonstrated that our model outperforms the state-of-the-art factorization methods in both accuracy and efficiency in recovering the original datasets, and the inferred bias level is highly significantly correlated with true existing bias in both simulated and real world datasets.Item A data denoising approach to optimize functional clustering of single cell RNA-sequencing data(IEEE, 2020-12) Wan, Changlin; Jia, Dongya; Zhao, Yue; Chang, Wennan; Cao, Sha; Wang, Xiao; Zhang, Chi; Medical and Molecular Genetics, School of MedicineSingle cell RNA-sequencing (scRNA-seq) technology enables comprehensive transcriptomic profiling of thousands of cells with distinct phenotypic and physiological states in a complex tissue. Substantial efforts have been made to characterize single cells of distinct identities from scRNA-seq data, including various cell clustering techniques. While existing approaches can handle single cells in terms of different cell (sub)types at a high resolution, identification of the functional variability within the same cell type remains unsolved. In addition, there is a lack of robust method to handle the inter-subject variation that often brings severe confounding effects for the functional clustering of single cells. In this study, we developed a novel data denoising and cell clustering approach, namely CIBS, to provide biologically explainable functional classification for scRNA-seq data. CIBS is based on a systems biology model of transcriptional regulation that assumes a multi-modality distribution of the cells' activation status, and it utilizes a Boolean matrix factorization approach on the discretized expression status to robustly derive functional modules. CIBS is empowered by a novel fast Boolean Matrix Factorization method, namely PFAST, to increase the computational feasibility on large scale scRNA-seq data. Application of CIBS on two scRNA-seq datasets collected from cancer tumor micro-environment successfully identified subgroups of cancer cells with distinct expression patterns of epithelial-mesenchymal transition and extracellular matrix marker genes, which was not revealed by the existing cell clustering analysis tools. The identified cell groups were significantly associated with the clinically confirmed lymph-node invasion and metastasis events across different patients.Item Denoising Individual Bias for Fairer Binary Submatrix Detection(ACM, 2020-10) Wan, Changlin; Chang, Wennan; Zhao, Tong; Cao, Sha; Zhang, Chi; Biostatistics, School of Public HealthLow rank representation of binary matrix is powerful in disentangling sparse individual-attribute associations, and has received wide applications. Existing binary matrix factorization (BMF) or co-clustering (CC) methods often assume i.i.d background noise. However, this assumption could be easily violated in real data, where heterogeneous row- or column-wise probability of binary entries results in disparate element-wise background distribution, and paralyzes the rationality of existing methods. We propose a binary data denoising framework, namely BIND, which optimizes the detection of true patterns by estimating the row- or column-wise mixture distribution of patterns and disparate background, and eliminating the binary attributes that are more likely from the background. BIND is supported by thoroughly derived mathematical property of the row- and column-wise mixture distributions. Our experiment on synthetic and real-world data demonstrated BIND effectively removes background noise and drastically increases the fairness and accuracy of state-of-the arts BMF and CC methods.Item Fast and Efficient Boolean Matrix Factorization by Geometric Segmentation(AAAI, 2020-06) Wan, Changlin; Chang, Wennan; Zhao, Tong; Li, Mengya; Cao, Sha; Zhang, Chi; Medical and Molecular Genetics, School of MedicineBoolean matrix has been used to represent digital information in many fields, including bank transaction, crime records, natural language processing, protein-protein interaction, etc. Boolean matrix factorization (BMF) aims to find an approximation of a binary matrix as the Boolean product of two low rank Boolean matrices, which could generate vast amount of information for the patterns of relationships between the features and samples. Inspired by binary matrix permutation theories and geometric segmentation, we developed a fast and efficient BMF approach, called MEBF (Median Expansion for Boolean Factorization). Overall, MEBF adopted a heuristic approach to locate binary patterns presented as submatrices that are dense in 1's. At each iteration, MEBF permutates the rows and columns such that the permutated matrix is approximately Upper Triangular-Like (UTL) with so-called Simultaneous Consecutive-ones Property (SC1P). The largest submatrix dense in 1 would lie on the upper triangular area of the permutated matrix, and its location was determined based on a geometric segmentation of a triangular. We compared MEBF with other state of the art approaches on data scenarios with different density and noise levels. MEBF demonstrated superior performances in lower reconstruction error, and higher computational efficiency, as well as more accurate density patterns than popular methods such as ASSO, PANDA and Message Passing. We demonstrated the application of MEBF on both binary and non-binary data sets, and revealed its further potential in knowledge retrieving and data denoising.Item Geometric All-way Boolean Tensor Decomposition(2020) Wan, Changlin; Chang, Wennan; Zhao, Tong; Cao, Sha; Zhang, Chi; Biostatistics, School of Public HealthBoolean tensor has been broadly utilized in representing high dimensional logical data collected on spatial, temporal and/or other relational domains. Boolean Tensor Decomposition (BTD) factorizes a binary tensor into the Boolean sum of multiple rank-1 tensors, which is an NP-hard problem. Existing BTD methods have been limited by their high computational cost, in applications to large scale or higher order tensors. In this work, we presented a computationally efficient BTD algorithm, namely Geometric Expansion for all-order Tensor Factorization (GETF), that sequentially identifies the rank-1 basis components for a tensor from a geometric perspective. We conducted rigorous theoretical analysis on the validity as well as algorithemic efficiency of GETF in decomposing all-order tensor. Experiments on both synthetic and real-world data demonstrated that GETF has significantly improved performance in reconstruction accuracy, extraction of latent structures and it is an order of magnitude faster than other state-of-the-art methods.Item ICTD: A semi-supervised cell type identification and deconvolution method for multi-omics data(BioRxiv, 2019) Chang, Wennan; Wan, Changlin; Lu, Xiaoyu; Tu, Szu-wei; Sun, Yifan; Zhang, Xinna; Zang, Yong; Zhang, Anru; Huang, Kun; Liu, Yunlong; Lu, Xiongbin; Cao, Sha; Zhang, Chi; Medical and Molecular Genetics, School of MedicineWe developed a novel deconvolution method, namely Inference of Cell Types and Deconvolution (ICTD) that addresses the fundamental issue of identifiability and robustness in current tissue data deconvolution problem. ICTD provides substantially new capabilities for omics data based characterization of a tissue microenvironment, including (1) maximizing the resolution in identifying resident cell and sub types that truly exists in a tissue, (2) identifying the most reliable marker genes for each cell type, which are tissue and data set specific, (3) handling the stability problem with co-linear cell types, (4) co-deconvoluting with available matched multi-omics data, and (5) inferring functional variations specific to one or several cell types. ICTD is empowered by (i) rigorously derived mathematical conditions of identifiable cell type and cell type specific functions in tissue transcriptomics data and (ii) a semi supervised approach to maximize the knowledge transfer of cell type and functional marker genes identified in single cell or bulk cell data in the analysis of tissue data, and (iii) a novel unsupervised approach to minimize the bias brought by training data. Application of ICTD on real and single cell simulated tissue data validated that the method has consistently good performance for tissue data coming from different species, tissue microenvironments, and experimental platforms. Other than the new capabilities, ICTD outperformed other state-of-the-art devolution methods on prediction accuracy, the resolution of identifiable cell, detection of unknown sub cell types, and assessment of cell type specific functions. The premise of ICTD also lies in characterizing cell-cell interactions and discovering cell types and prognostic markers that are predictive of clinical outcomes.Item IRIS-FGM: an integrative single-cell RNA-Seq interpretation system for functional gene module analysis(Oxford University Press, 2021) Chang, Yuzhou; Allen, Carter; Wan, Changlin; Chung, Dongjun; Zhang, Chi; Li, Zihai; Ma, Qin; Medical and Molecular Genetics, School of MedicineSummary: Single-cell RNA-Seq (scRNA-Seq) data is useful in discovering cell heterogeneity and signature genes in specific cell populations in cancer and other complex diseases. Specifically, the investigation of condition-specific functional gene modules (FGM) can help to understand interactive gene networks and complex biological processes in different cell clusters. QUBIC2 is recognized as one of the most efficient and effective biclustering tools for condition-specific FGM identification from scRNA-Seq data. However, its limited availability to a C implementation restricted its application to only a few downstream analysis functionalities. We developed an R package named IRIS-FGM (Integrative scRNA-Seq Interpretation System for Functional Gene Module analysis) to support the investigation of FGMs and cell clustering using scRNA-Seq data. Empowered by QUBIC2, IRIS-FGM can effectively identify condition-specific FGMs, predict cell types/clusters, uncover differentially expressed genes and perform pathway enrichment analysis. It is noteworthy that IRIS-FGM can also take Seurat objects as input, facilitating easy integration with the existing analysis pipeline. Availability and implementation: IRIS-FGM is implemented in the R environment (as of version 3.6) with the source code freely available at https://github.com/BMEngineeR/IRISFGM.Item LTMG: a novel statistical modeling of transcriptional expression states in single-cell RNA-Seq data(Oxford University Press, 2019-10-10) Wan, Changlin; Chang, Wennan; Zhang, Yu; Shah, Fenil; Lu, Xiaoyu; Zang, Yong; Zhang, Anru; Cao, Sha; Fishel, Melissa L.; Ma, Qin; Zhang, Chi; Medical and Molecular Genetics, School of MedicineA key challenge in modeling single-cell RNA-seq data is to capture the diversity of gene expression states regulated by different transcriptional regulatory inputs across individual cells, which is further complicated by largely observed zero and low expressions. We developed a left truncated mixture Gaussian (LTMG) model, from the kinetic relationships of the transcriptional regulatory inputs, mRNA metabolism and abundance in single cells. LTMG infers the expression multi-modalities across single cells, meanwhile, the dropouts and low expressions are treated as left truncated. We demonstrated that LTMG has significantly better goodness of fitting on an extensive number of scRNA-seq data, comparing to three other state-of-the-art models. Our biological assumption of the low non-zero expressions, rationality of the multimodality setting, and the capability of LTMG in extracting expression states specific to cell types or functions, are validated on independent experimental data sets. A differential gene expression test and a co-regulation module identification method are further developed. We experimentally validated that our differential expression test has higher sensitivity and specificity, compared with other five popular methods. The co-regulation analysis is capable of retrieving gene co-regulation modules corresponding to perturbed transcriptional regulations. A user-friendly R package with all the analysis power is available at https://github.com/zy26/LTMGSCA.