Celltyper: A Single-Cell Sequencing Marker Gene Tool Suite
Date
Authors
Language
Embargo Lift Date
Department
Committee Chair
Committee Members
Degree
Degree Year
Department
Grantor
Journal Title
Journal ISSN
Volume Title
Found At
Abstract
Single-cell RNA-sequencing (scRNA-seq) has enabled researchers to study interindividual cellular heterogeneity, to explore disease impact on cellular composition of tissue, and to identify novel cell subtypes. However, a major challenge in scRNA-seq analysis is to identify the cell type of individual cells. Accurate cell type identification is crucial for any scRNA-seq analysis to be valid as incorrect cell type assignment will reduce statistical robustness and may lead to incorrect biological conclusions. Therefore, accurate and comprehensive cell type assignment is necessary for reliable biological insights into scRNA-seq datasets. With over 200 distinct cell types in humans alone, the concept of cell identity is large. Even within the same cell type there exists heterogeneity due to cell cycle phase, cell state, cell subtypes, cell health and the tissue microenvironment. This makes cell type classification a complicated biological problem requiring bioinformatics. One approach to classify cell type identity is using marker genes. Marker genes are genes specific for one or a few cell types. When coupled with bioinformatic methods, marker genes show promise of improving cell type classification. However, current scRNA-seq classification methods and databases use marker genes that are non-specific across sources, samples, and/or species leading to bias and errors. Furthermore, many existing tools require manual intervention by the user to provide training datasets or the expected number and name of cell types, which can introduce selection bias. The selection bias negatively impacts the accuracy of cell type classification methods as the model cannot extrapolate outside of the user inputs even when it is biologically meaningful to do so. In this dissertation I developed CellTypeR, a suite of tools to explore the biology governing cell identity in a “normal” state for humans and mice. The work presented here accomplishes three aims: 1. Develop an ontology standardized database of published marker gene literature; 2. Develop and apply a marker gene classification algorithm; and 3. Create user interface and input data structure for scRNA-seq cell type prediction.