Optimizing hydropathy scale to improve IDP prediction and characterizing IDPs' functions

dc.contributor.advisorDunker, A. Keith
dc.contributor.authorHuang, Fei
dc.contributor.otherChen, Jake
dc.contributor.otherHurley, Thomas D., 1961-
dc.contributor.otherShen, Li
dc.date.accessioned2014-10-03T15:21:39Z
dc.date.available2014-10-03T15:21:39Z
dc.date.issued2014-01
dc.degree.date2014en_US
dc.degree.disciplineDepartment of Biochemistry & Molecular Biologyen
dc.degree.grantorIndiana Universityen_US
dc.degree.levelPh.D.en_US
dc.descriptionIndiana University-Purdue University Indianapolis (IUPUI)en_US
dc.description.abstractIntrinsically disordered proteins (IDPs) are flexible proteins without defined 3D structures. Studies show that IDPs are abundant in nature and actively involved in numerous biological processes. Two crucial subjects in the study of IDPs lie in analyzing IDPs’ functions and identifying them. We thus carried out three projects to better understand IDPs. In the 1st project, we propose a method that separates IDPs into different function groups. We used the approach of CH-CDF plot, which is based the combined use of two predictors and subclassifies proteins into 4 groups: structured, mixed, disordered, and rare. Studies show different structural biases for each group. The mixed class has more order-promoting residues and more ordered regions than the disordered class. In addition, the disordered class is highly active in mitosis-related processes among others. Meanwhile, the mixed class is highly associated with signaling pathways, where having both ordered and disordered regions could possibly be important. The 2nd project is about identifying if an unknown protein is entirely disordered. One of the earliest predictors for this purpose, the charge-hydropathy plot (C-H plot), exploited the charge and hydropathy features of the protein. Not only is this algorithm simple yet powerful, its input parameters, charge and hydropathy, are informative and readily interpretable. We found that using different hydropathy scales significantly affects the prediction accuracy. Therefore, we sought to identify a new hydropathy scale that optimizes the prediction. This new scale achieves an accuracy of 91%, a significant improvement over the original 79%. In our 3rd project, we developed a per-residue C-H IDP predictor, in which three hydropathy scales are optimized individually. This is to account for the amino acid composition differences in three regions of a protein sequence (N, C terminus and internal). We then combined them into a single per-residue predictor that achieves an accuracy of 74% for per-residue predictions for proteins containing long IDP regions.en_US
dc.identifier.urihttps://hdl.handle.net/1805/5191
dc.identifier.urihttp://dx.doi.org/10.7912/C2/1802
dc.language.isoen_USen_US
dc.subjectIntrinsically disordered proteinsen_US
dc.subjectSupport vector machineen_US
dc.subjectClusteringen_US
dc.subject.lcshProteins -- Structure-activity relationships -- Researchen_US
dc.subject.lcshProteins -- Conformation -- Researchen_US
dc.subject.lcshProteins -- Denaturationen_US
dc.subject.lcshProtein folding -- Researchen_US
dc.subject.lcshSupport vector machinesen_US
dc.subject.lcshAggregation (Chemistry)en_US
dc.subject.lcshMolecular biology -- Data processing -- Researchen_US
dc.subject.lcshAmino acids -- Analysisen_US
dc.subject.lcshCellular signal transductionen_US
dc.subject.lcshMolecular biology -- Mathematicsen_US
dc.subject.lcshAlgorithmsen_US
dc.titleOptimizing hydropathy scale to improve IDP prediction and characterizing IDPs' functionsen_US
dc.typeThesisen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
HuangFei_Thesis_Final_2.pdf
Size:
1.76 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: