classCleaner: A Quantitative Method for Validating Peptide Identification in LC-MS/MS Workflows

Date
2020-05
Language
American English
Embargo Lift Date
Department
Committee Chair
Degree
Ph.D.
Degree Year
2020
Department
Grantor
Indiana University
Journal Title
Journal ISSN
Volume Title
Found At
Abstract

Because label-free liquid chromatography-tandem mass spectrometry (LC-MS/MS) shotgun proteomics infers the peptide sequence of each measurement, there is inherent uncertainty in the identity of each peptide and its originating protein. Removing misidentified peptides can improve the accuracy and power of downstream analyses when differences between proteins are of primary interest. In this dissertation I present classCleaner, a novel algorithm designed to identify misidentified peptides from each protein using the available quantitative data. The algorithm is based on the idea that distances between peptides belonging to the same protein are stochastically smaller than those between peptides in different proteins. The method first determines a threshold based on the estimated distribution of these two groups of distances. This is used to create a decision rule for each peptide based on counting the number of within-protein distances smaller than the threshold. Using simulated data, I show that classCleaner always reduces the proportion of misidentified peptides, with better results for larger proteins (by number of constituent peptides), smaller inherent misidentification rates, and larger sample sizes. ClassCleaner is also applied to a LC-MS/MS proteomics data set and the Congressional Voting Records data set from the UCI machine learning repository. The later is used to demonstrate that the algorithm is not specific to proteomics.

Description
Indiana University-Purdue University Indianapolis (IUPUI)
item.page.description.tableofcontents
item.page.relation.haspart
Cite As
ISSN
Publisher
Series/Report
Sponsorship
Major
Extent
Identifier
Relation
Journal
Rights
Source
Alternative Title
Type
Dissertation
Number
Volume
Conference Dates
Conference Host
Conference Location
Conference Name
Conference Panel
Conference Secretariat Location
Version
Full Text Available at
This item is under embargo {{howLong}}