Discriminating between disease-causing and neutral non-frameshifting micro-INDELs by support vector machines by means of integrated sequence- and structure-based features

Zhao, Huiying; Yang, Yuedong; Lin, Hai; Zhang, Xinjun; Mort, Matthew; Cooper, David N.; Liu, Yunlong; Zhou, Yaoqi

Discriminating between disease-causing and neutral non-frameshifting micro-INDELs by support vector machines by means of integrated sequence- and structure-based features

dc.contributor.author	Zhao, Huiying
dc.contributor.author	Yang, Yuedong
dc.contributor.author	Lin, Hai
dc.contributor.author	Zhang, Xinjun
dc.contributor.author	Mort, Matthew
dc.contributor.author	Cooper, David N.
dc.contributor.author	Liu, Yunlong
dc.contributor.author	Zhou, Yaoqi
dc.date.accessioned	2015-10-02T13:16:21Z
dc.date.available	2015-10-02T13:16:21Z
dc.date.issued	2013-04-05
dc.description	poster abstract	en_US
dc.description.abstract	Micro-INDELs (insertions or deletions of ≤20 bp) constitute the second most frequent class of human gene mutation after single nucleotide variants. A significant portion of exonic INDELs are non-frameshifting (NFS), serving to insert or delete a discrete number of amino-acid residues. Despite the relative abundance of NFS-INDELs, their damaging effect on protein structure and function has gone largely unstudied whilst bioinformatics tools for discriminating between disease-causing and neutral NFS-INDELs remain to be developed. We have developed such a technique (DDIG-in; Detecting DIsease-causing Genetic variations due to INDELs) by comparing the properties of disease-causing NFS-INDELs from the Human Gene Mutation Database (HGMD) with putatively neutral NFS-INDELs from the 1,000 Genomes Project. Having considered 58 different sequence- and structure-based features, we found that predicted disordered regions around the NFS-INDEL region had the highest discriminative capability (disease versus neutral) with an Area Under the receiver-operating characteristic Curve (AUC) of 0.82 and a Matthews Correlation Coefficient (MCC) of 0.56. All features studied were combined by support vector machines (SVM) and selected by a greedy algorithm. The resulting SVM models were trained and tested by ten-fold cross-validation on the microdeletion dataset and independently tested on the microinsertion dataset and vice versa. The final SVM model for determining NFS-INDEL disease-causing probability was built on non-redundant datasets with a protein sequence identity cutoff of 35% and yielded an MCC value of 0.68, an accuracy of 84% and an AUC of 0.89. Predicted disease-causing probabilities exhibited a strong negative correlation with the average minor allele frequency (correlation coefficient, -0.84). DDIG-in, available at http://sparks.informatics.iupui.edu, can be used to estimate the disease-causing probability for a given NFS-INDEL.	en_US
dc.identifier.citation	Zhao, Huiying, Yuedong Yang, Hai Lin, Xinjun Zhang, Matthew Mort, David N. Cooper, Yunlong Liu, and Yaoqi Zhou. (2013, April 5). Discriminating between disease-causing and neutral non-frameshifting micro-INDELs by support vector machines by means of integrated sequence- and structure-based features. Poster session presented at IUPUI Research Day 2013, Indianapolis, Indiana.	en_US
dc.identifier.uri	https://hdl.handle.net/1805/7109
dc.language.iso	en_US	en_US
dc.publisher	Office of the Vice Chancellor for Research	en_US
dc.subject	micro-INDELs	en_US
dc.subject	human gene mutation	en_US
dc.subject	non-frameshifting exonic INDELs	en_US
dc.subject	disease-causing non-frameshifting INDELs	en_US
dc.subject	neutral non-frameshifting INDELs	en_US
dc.subject	Detecting DIsease-causing Genetic variations due to INDELs	en_US
dc.title	Discriminating between disease-causing and neutral non-frameshifting micro-INDELs by support vector machines by means of integrated sequence- and structure-based features	en_US
dc.type	Poster	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Zhao-discriminating.pdf
Size:: 48.23 KB
Format:: Adobe Portable Document Format

Download

Collections

IUPUI Research Day 2013