classCleaner: A Quantitative Method for Validating Peptide Identification in LC-MS/MS Workflows

Key, Melissa Chester

classCleaner: A Quantitative Method for Validating Peptide Identification in LC-MS/MS Workflows

dc.contributor.advisor	Boukai, Benzion
dc.contributor.author	Key, Melissa Chester
dc.contributor.other	Ragg, Susanne
dc.contributor.other	Katz, Barry
dc.contributor.other	Mosley, Amber
dc.date.accessioned	2020-05-21T14:04:13Z
dc.date.available	2020-05-21T14:04:13Z
dc.date.issued	2020-05
dc.degree.date	2020	en_US
dc.degree.discipline
dc.degree.grantor	Indiana University	en_US
dc.degree.level	Ph.D.	en_US
dc.description	Indiana University-Purdue University Indianapolis (IUPUI)	en_US
dc.description.abstract	Because label-free liquid chromatography-tandem mass spectrometry (LC-MS/MS) shotgun proteomics infers the peptide sequence of each measurement, there is inherent uncertainty in the identity of each peptide and its originating protein. Removing misidentified peptides can improve the accuracy and power of downstream analyses when differences between proteins are of primary interest. In this dissertation I present classCleaner, a novel algorithm designed to identify misidentified peptides from each protein using the available quantitative data. The algorithm is based on the idea that distances between peptides belonging to the same protein are stochastically smaller than those between peptides in different proteins. The method first determines a threshold based on the estimated distribution of these two groups of distances. This is used to create a decision rule for each peptide based on counting the number of within-protein distances smaller than the threshold. Using simulated data, I show that classCleaner always reduces the proportion of misidentified peptides, with better results for larger proteins (by number of constituent peptides), smaller inherent misidentification rates, and larger sample sizes. ClassCleaner is also applied to a LC-MS/MS proteomics data set and the Congressional Voting Records data set from the UCI machine learning repository. The later is used to demonstrate that the algorithm is not specific to proteomics.	en_US
dc.identifier.uri	https://hdl.handle.net/1805/22837
dc.identifier.uri	http://dx.doi.org/10.7912/C2/2814
dc.language.iso	en_US	en_US
dc.subject	class labels	en_US
dc.subject	classification	en_US
dc.subject	filtering	en_US
dc.subject	outliers	en_US
dc.subject	proteomics	en_US
dc.title	classCleaner: A Quantitative Method for Validating Peptide Identification in LC-MS/MS Workflows	en_US
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Key_iupui_0104D_10440.pdf
Size:: 5.46 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.99 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Biostatistics Department Theses and Dissertations