Optimizing hydropathy scale to improve IDP prediction and characterizing IDPs' functions
dc.contributor.advisor | Dunker, A. Keith | |
dc.contributor.author | Huang, Fei | |
dc.contributor.other | Chen, Jake | |
dc.contributor.other | Hurley, Thomas D., 1961- | |
dc.contributor.other | Shen, Li | |
dc.date.accessioned | 2014-10-03T15:21:39Z | |
dc.date.available | 2014-10-03T15:21:39Z | |
dc.date.issued | 2014-01 | |
dc.degree.date | 2014 | en_US |
dc.degree.discipline | Department of Biochemistry & Molecular Biology | en |
dc.degree.grantor | Indiana University | en_US |
dc.degree.level | Ph.D. | en_US |
dc.description | Indiana University-Purdue University Indianapolis (IUPUI) | en_US |
dc.description.abstract | Intrinsically disordered proteins (IDPs) are flexible proteins without defined 3D structures. Studies show that IDPs are abundant in nature and actively involved in numerous biological processes. Two crucial subjects in the study of IDPs lie in analyzing IDPs’ functions and identifying them. We thus carried out three projects to better understand IDPs. In the 1st project, we propose a method that separates IDPs into different function groups. We used the approach of CH-CDF plot, which is based the combined use of two predictors and subclassifies proteins into 4 groups: structured, mixed, disordered, and rare. Studies show different structural biases for each group. The mixed class has more order-promoting residues and more ordered regions than the disordered class. In addition, the disordered class is highly active in mitosis-related processes among others. Meanwhile, the mixed class is highly associated with signaling pathways, where having both ordered and disordered regions could possibly be important. The 2nd project is about identifying if an unknown protein is entirely disordered. One of the earliest predictors for this purpose, the charge-hydropathy plot (C-H plot), exploited the charge and hydropathy features of the protein. Not only is this algorithm simple yet powerful, its input parameters, charge and hydropathy, are informative and readily interpretable. We found that using different hydropathy scales significantly affects the prediction accuracy. Therefore, we sought to identify a new hydropathy scale that optimizes the prediction. This new scale achieves an accuracy of 91%, a significant improvement over the original 79%. In our 3rd project, we developed a per-residue C-H IDP predictor, in which three hydropathy scales are optimized individually. This is to account for the amino acid composition differences in three regions of a protein sequence (N, C terminus and internal). We then combined them into a single per-residue predictor that achieves an accuracy of 74% for per-residue predictions for proteins containing long IDP regions. | en_US |
dc.identifier.uri | https://hdl.handle.net/1805/5191 | |
dc.identifier.uri | http://dx.doi.org/10.7912/C2/1802 | |
dc.language.iso | en_US | en_US |
dc.subject | Intrinsically disordered proteins | en_US |
dc.subject | Support vector machine | en_US |
dc.subject | Clustering | en_US |
dc.subject.lcsh | Proteins -- Structure-activity relationships -- Research | en_US |
dc.subject.lcsh | Proteins -- Conformation -- Research | en_US |
dc.subject.lcsh | Proteins -- Denaturation | en_US |
dc.subject.lcsh | Protein folding -- Research | en_US |
dc.subject.lcsh | Support vector machines | en_US |
dc.subject.lcsh | Aggregation (Chemistry) | en_US |
dc.subject.lcsh | Molecular biology -- Data processing -- Research | en_US |
dc.subject.lcsh | Amino acids -- Analysis | en_US |
dc.subject.lcsh | Cellular signal transduction | en_US |
dc.subject.lcsh | Molecular biology -- Mathematics | en_US |
dc.subject.lcsh | Algorithms | en_US |
dc.title | Optimizing hydropathy scale to improve IDP prediction and characterizing IDPs' functions | en_US |
dc.type | Thesis | en_US |