classCleaner: A Quantitative Method for Validating Peptide Identification in LC-MS/MS Workflows

dc.contributor.advisorBoukai, Benzion
dc.contributor.authorKey, Melissa Chester
dc.contributor.otherRagg, Susanne
dc.contributor.otherKatz, Barry
dc.contributor.otherMosley, Amber
dc.date.accessioned2020-05-21T14:04:13Z
dc.date.available2020-05-21T14:04:13Z
dc.date.issued2020-05
dc.degree.date2020en_US
dc.degree.discipline
dc.degree.grantorIndiana Universityen_US
dc.degree.levelPh.D.en_US
dc.descriptionIndiana University-Purdue University Indianapolis (IUPUI)en_US
dc.description.abstractBecause label-free liquid chromatography-tandem mass spectrometry (LC-MS/MS) shotgun proteomics infers the peptide sequence of each measurement, there is inherent uncertainty in the identity of each peptide and its originating protein. Removing misidentified peptides can improve the accuracy and power of downstream analyses when differences between proteins are of primary interest. In this dissertation I present classCleaner, a novel algorithm designed to identify misidentified peptides from each protein using the available quantitative data. The algorithm is based on the idea that distances between peptides belonging to the same protein are stochastically smaller than those between peptides in different proteins. The method first determines a threshold based on the estimated distribution of these two groups of distances. This is used to create a decision rule for each peptide based on counting the number of within-protein distances smaller than the threshold. Using simulated data, I show that classCleaner always reduces the proportion of misidentified peptides, with better results for larger proteins (by number of constituent peptides), smaller inherent misidentification rates, and larger sample sizes. ClassCleaner is also applied to a LC-MS/MS proteomics data set and the Congressional Voting Records data set from the UCI machine learning repository. The later is used to demonstrate that the algorithm is not specific to proteomics.en_US
dc.identifier.urihttps://hdl.handle.net/1805/22837
dc.identifier.urihttp://dx.doi.org/10.7912/C2/2814
dc.language.isoen_USen_US
dc.subjectclass labelsen_US
dc.subjectclassificationen_US
dc.subjectfilteringen_US
dc.subjectoutliersen_US
dc.subjectproteomicsen_US
dc.titleclassCleaner: A Quantitative Method for Validating Peptide Identification in LC-MS/MS Workflowsen_US
dc.typeDissertation
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Key_iupui_0104D_10440.pdf
Size:
5.46 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.99 KB
Format:
Item-specific license agreed upon to submission
Description: