LAmbDA: label ambiguous domain adaptation dataset integration reduces batch effects and improves subtype detection

dc.contributor.authorJohnson, Travis S.
dc.contributor.authorWang, Tongxin
dc.contributor.authorHuang, Zhi
dc.contributor.authorYu, Christina Y.
dc.contributor.authorWu, Yi
dc.contributor.authorHan, Yatong
dc.contributor.authorZhang, Yan
dc.contributor.authorHuang, Kun
dc.contributor.authorZhang, Jie
dc.contributor.departmentMedicine, School of Medicineen_US
dc.date.accessioned2022-01-12T21:52:00Z
dc.date.available2022-01-12T21:52:00Z
dc.date.issued2019-04
dc.description.abstractMotivation Rapid advances in single cell RNA sequencing (scRNA-seq) have produced higher-resolution cellular subtypes in multiple tissues and species. Methods are increasingly needed across datasets and species to (i) remove systematic biases, (ii) model multiple datasets with ambiguous labels and (iii) classify cells and map cell type labels. However, most methods only address one of these problems on broad cell types or simulated data using a single model type. It is also important to address higher-resolution cellular subtypes, subtype labels from multiple datasets, models trained on multiple datasets simultaneously and generalizability beyond a single model type. Results We developed a species- and dataset-independent transfer learning framework (LAmbDA) to train models on multiple datasets (even from different species) and applied our framework on simulated, pancreas and brain scRNA-seq experiments. These models mapped corresponding cell types between datasets with inconsistent cell subtype labels while simultaneously reducing batch effects. We achieved high accuracy in labeling cellular subtypes (weighted accuracy simulated 1 datasets: 90%; simulated 2 datasets: 94%; pancreas datasets: 88% and brain datasets: 66%) using LAmbDA Feedforward 1 Layer Neural Network with bagging. This method achieved higher weighted accuracy in labeling cellular subtypes than two other state-of-the-art methods, scmap and CaSTLe in brain (66% versus 60% and 32%). Furthermore, it achieved better performance in correctly predicting ambiguous cellular subtype labels across datasets in 88% of test cases compared with CaSTLe (63%), scmap (50%) and MetaNeighbor (50%). LAmbDA is model- and dataset-independent and generalizable to diverse data types representing an advance in biocomputing.en_US
dc.eprint.versionFinal published versionen_US
dc.identifier.citationJohnson, T. S., Wang, T., Huang, Z., Yu, C. Y., Wu, Y., Han, Y., Zhang, Y., Huang, K., & Zhang, J. (2019). LAmbDA: Label ambiguous domain adaptation dataset integration reduces batch effects and improves subtype detection. Bioinformatics, 35(22), 4696–4706. https://doi.org/10.1093/bioinformatics/btz295en_US
dc.identifier.issn1367-4803, 1460-2059en_US
dc.identifier.urihttps://hdl.handle.net/1805/27408
dc.language.isoen_USen_US
dc.publisherOxford Academicen_US
dc.relation.isversionof10.1093/bioinformatics/btz295en_US
dc.relation.journalBioinformaticsen_US
dc.rightsPublisher Policyen_US
dc.sourcePMCen_US
dc.subjectRNAen_US
dc.subjectLAmbDAen_US
dc.subjectscRNA-seqen_US
dc.titleLAmbDA: label ambiguous domain adaptation dataset integration reduces batch effects and improves subtype detectionen_US
dc.typeArticleen_US
ul.alternative.fulltexthttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC6853662/en_US
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LAmbDA.pdf
Size:
1.49 MB
Format:
Adobe Portable Document Format
Description:
Article
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.99 KB
Format:
Item-specific license agreed upon to submission
Description: