Wan, ChanglinJia, DongyaZhao, YueChang, WennanCao, ShaWang, XiaoZhang, Chi2022-03-252022-03-252020-12Wan, C., Jia, D., Zhao, Y., Chang, W., Cao, S., Wang, X., & Zhang, C. (2020). A data denoising approach to optimize functional clustering of single cell RNA-sequencing data. 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 217–222. https://doi.org/10.1109/BIBM49941.2020.9313483https://hdl.handle.net/1805/28308Single cell RNA-sequencing (scRNA-seq) technology enables comprehensive transcriptomic profiling of thousands of cells with distinct phenotypic and physiological states in a complex tissue. Substantial efforts have been made to characterize single cells of distinct identities from scRNA-seq data, including various cell clustering techniques. While existing approaches can handle single cells in terms of different cell (sub)types at a high resolution, identification of the functional variability within the same cell type remains unsolved. In addition, there is a lack of robust method to handle the inter-subject variation that often brings severe confounding effects for the functional clustering of single cells. In this study, we developed a novel data denoising and cell clustering approach, namely CIBS, to provide biologically explainable functional classification for scRNA-seq data. CIBS is based on a systems biology model of transcriptional regulation that assumes a multi-modality distribution of the cells' activation status, and it utilizes a Boolean matrix factorization approach on the discretized expression status to robustly derive functional modules. CIBS is empowered by a novel fast Boolean Matrix Factorization method, namely PFAST, to increase the computational feasibility on large scale scRNA-seq data. Application of CIBS on two scRNA-seq datasets collected from cancer tumor micro-environment successfully identified subgroups of cancer cells with distinct expression patterns of epithelial-mesenchymal transition and extracellular matrix marker genes, which was not revealed by the existing cell clustering analysis tools. The identified cell groups were significantly associated with the clinically confirmed lymph-node invasion and metastasis events across different patients.enPublisher Policycell clustering analysisdata denoisingboolean matrix factorizationA data denoising approach to optimize functional clustering of single cell RNA-sequencing dataArticle