Characterization of intrinsically disordered regions in proteins informed by human genetic diversity

dc.contributor.authorAhmed, Shehab S.
dc.contributor.authorRifat, Zaara T.
dc.contributor.authorLohia, Ruchi
dc.contributor.authorCampbell, Arthur J.
dc.contributor.authorDunker, A. Keith
dc.contributor.authorRahman, M. Sohel
dc.contributor.authorIqbal, Sumaiya
dc.contributor.departmentBiochemistry and Molecular Biology, School of Medicine
dc.date.accessioned2023-05-30T12:44:37Z
dc.date.available2023-05-30T12:44:37Z
dc.date.issued2022-03-11
dc.description.abstractAll proteomes contain both proteins and polypeptide segments that don't form a defined three-dimensional structure yet are biologically active-called intrinsically disordered proteins and regions (IDPs and IDRs). Most of these IDPs/IDRs lack useful functional annotation limiting our understanding of their importance for organism fitness. Here we characterized IDRs using protein sequence annotations of functional sites and regions available in the UniProt knowledgebase ("UniProt features": active site, ligand-binding pocket, regions mediating protein-protein interactions, etc.). By measuring the statistical enrichment of twenty-five UniProt features in 981 IDRs of 561 human proteins, we identified eight features that are commonly located in IDRs. We then collected the genetic variant data from the general population and patient-based databases and evaluated the prevalence of population and pathogenic variations in IDPs/IDRs. We observed that some IDRs tolerate 2 to 12-times more single amino acid-substituting missense mutations than synonymous changes in the general population. However, we also found that 37% of all germline pathogenic mutations are located in disordered regions of 96 proteins. Based on the observed-to-expected frequency of mutations, we categorized 34 IDRs in 20 proteins (DDX3X, KIT, RB1, etc.) as intolerant to mutation. Finally, using statistical analysis and a machine learning approach, we demonstrate that mutation-intolerant IDRs carry a distinct signature of functional features. Our study presents a novel approach to assign functional importance to IDRs by leveraging the wealth of available genetic data, which will aid in a deeper understating of the role of IDRs in biological processes and disease mechanisms.en_US
dc.eprint.versionFinal published versionen_US
dc.identifier.citationAhmed SS, Rifat ZT, Lohia R, et al. Characterization of intrinsically disordered regions in proteins informed by human genetic diversity. PLoS Comput Biol. 2022;18(3):e1009911. Published 2022 Mar 11. doi:10.1371/journal.pcbi.1009911en_US
dc.identifier.urihttps://hdl.handle.net/1805/33339
dc.language.isoen_USen_US
dc.publisherPLOSen_US
dc.relation.isversionof10.1371/journal.pcbi.1009911en_US
dc.relation.journalPLOS COMPUTATIONAL BIOLOGYen_US
dc.rightsAttribution 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/*
dc.sourcePMCen_US
dc.subjectAmino acid sequenceen_US
dc.subjectGenetic variationen_US
dc.subjectIntrinsically disordered proteinsen_US
dc.subjectProtein conformationen_US
dc.subjectProteomeen_US
dc.titleCharacterization of intrinsically disordered regions in proteins informed by human genetic diversityen_US
dc.typeArticleen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
pcbi.1009911.pdf
Size:
3.23 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.99 KB
Format:
Item-specific license agreed upon to submission
Description: