LAmbDA: label ambiguous domain adaptation dataset integration reduces batch effects and improves subtype detection

Johnson, Travis S.; Wang, Tongxin; Huang, Zhi; Yu, Christina Y.; Wu, Yi; Han, Yatong; Zhang, Yan; Huang, Kun; Zhang, Jie

LAmbDA: label ambiguous domain adaptation dataset integration reduces batch effects and improves subtype detection

dc.contributor.author	Johnson, Travis S.
dc.contributor.author	Wang, Tongxin
dc.contributor.author	Huang, Zhi
dc.contributor.author	Yu, Christina Y.
dc.contributor.author	Wu, Yi
dc.contributor.author	Han, Yatong
dc.contributor.author	Zhang, Yan
dc.contributor.author	Huang, Kun
dc.contributor.author	Zhang, Jie
dc.contributor.department	Medicine, School of Medicine	en_US
dc.date.accessioned	2022-01-12T21:52:00Z
dc.date.available	2022-01-12T21:52:00Z
dc.date.issued	2019-04
dc.description.abstract	Motivation Rapid advances in single cell RNA sequencing (scRNA-seq) have produced higher-resolution cellular subtypes in multiple tissues and species. Methods are increasingly needed across datasets and species to (i) remove systematic biases, (ii) model multiple datasets with ambiguous labels and (iii) classify cells and map cell type labels. However, most methods only address one of these problems on broad cell types or simulated data using a single model type. It is also important to address higher-resolution cellular subtypes, subtype labels from multiple datasets, models trained on multiple datasets simultaneously and generalizability beyond a single model type. Results We developed a species- and dataset-independent transfer learning framework (LAmbDA) to train models on multiple datasets (even from different species) and applied our framework on simulated, pancreas and brain scRNA-seq experiments. These models mapped corresponding cell types between datasets with inconsistent cell subtype labels while simultaneously reducing batch effects. We achieved high accuracy in labeling cellular subtypes (weighted accuracy simulated 1 datasets: 90%; simulated 2 datasets: 94%; pancreas datasets: 88% and brain datasets: 66%) using LAmbDA Feedforward 1 Layer Neural Network with bagging. This method achieved higher weighted accuracy in labeling cellular subtypes than two other state-of-the-art methods, scmap and CaSTLe in brain (66% versus 60% and 32%). Furthermore, it achieved better performance in correctly predicting ambiguous cellular subtype labels across datasets in 88% of test cases compared with CaSTLe (63%), scmap (50%) and MetaNeighbor (50%). LAmbDA is model- and dataset-independent and generalizable to diverse data types representing an advance in biocomputing.	en_US
dc.eprint.version	Final published version	en_US
dc.identifier.citation	Johnson, T. S., Wang, T., Huang, Z., Yu, C. Y., Wu, Y., Han, Y., Zhang, Y., Huang, K., & Zhang, J. (2019). LAmbDA: Label ambiguous domain adaptation dataset integration reduces batch effects and improves subtype detection. Bioinformatics, 35(22), 4696–4706. https://doi.org/10.1093/bioinformatics/btz295	en_US
dc.identifier.issn	1367-4803, 1460-2059	en_US
dc.identifier.uri	https://hdl.handle.net/1805/27408
dc.language.iso	en_US	en_US
dc.publisher	Oxford Academic	en_US
dc.relation.isversionof	10.1093/bioinformatics/btz295	en_US
dc.relation.journal	Bioinformatics	en_US
dc.rights	Publisher Policy	en_US
dc.source	PMC	en_US
dc.subject	RNA	en_US
dc.subject	LAmbDA	en_US
dc.subject	scRNA-seq	en_US
dc.title	LAmbDA: label ambiguous domain adaptation dataset integration reduces batch effects and improves subtype detection	en_US
dc.type	Article	en_US
ul.alternative.fulltext	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6853662/	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: LAmbDA.pdf
Size:: 1.49 MB
Format:: Adobe Portable Document Format
Description:: Article

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.99 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Open Access Policy Articles
Department of Medicine Works