Using Machine Learning Techniques to Improve Static Code Analysis Tools Usefulness

Alikhashashneh, Enas A.

Using Machine Learning Techniques to Improve Static Code Analysis Tools Usefulness

dc.contributor.advisor	Hill, James H.
dc.contributor.author	Alikhashashneh, Enas A.
dc.contributor.other	Al Hasan, Mohammad
dc.contributor.other	Raje, Rajeev R.
dc.contributor.other	Tuceryan, Mihran
dc.date.accessioned	2019-07-25T13:32:07Z
dc.date.available	2019-07-25T13:32:07Z
dc.date.issued	2019-08
dc.degree.date	2019	en_US
dc.degree.grantor	Purdue University	en_US
dc.degree.level	Ph.D.	en_US
dc.description	Indiana University-Purdue University Indianapolis (IUPUI)	en_US
dc.description.abstract	This dissertation proposes an approach to reduce the cost of manual inspections for as large a number of false positive warnings that are being reported by Static Code Analysis (SCA) tools as much as possible using Machine Learning (ML) techniques. The proposed approach neither assume to use the particular SCA tools nor depends on the specific programming language used to write the target source code or the application. To reduce the number of false positive warnings we first evaluated a number of SCA tools in terms of software engineering metrics using a highlighted synthetic source code named the Juliet test suite. From this evaluation, we concluded that the SCA tools report plenty of false positive warnings that need a manual inspection. Then we generated a number of datasets from the source code that forced the SCA tool to generate either true positive, false positive, or false negative warnings. The datasets, then, were used to train four of ML classifiers in order to classify the collected warnings from the synthetic source code. From the experimental results of the ML classifiers, we observed that the classifier that built using the Random Forests (RF) technique outperformed the rest of the classifiers. Lastly, using this classifier and an instance-based transfer learning technique, we ranked a number of warnings that were aggregated from various open-source software projects. The experimental results show that the proposed approach to reduce the cost of the manual inspection of the false positive warnings outperformed the random ranking algorithm and was highly correlated with the ranked list that the optimal ranking algorithm generated.	en_US
dc.identifier.uri	https://hdl.handle.net/1805/19942
dc.identifier.uri	http://dx.doi.org/10.7912/C2/2369
dc.language.iso	en	en_US
dc.rights	Attribution 3.0 United States	*
dc.rights.uri	https://creativecommons.org/licenses/by/3.0/us	*
dc.subject	Static code analysis	en_US
dc.subject	Source code metrics	en_US
dc.subject	Machine learning	en_US
dc.subject	False positives	en_US
dc.subject	Reduction	en_US
dc.subject	Instance-based transfer learning technique	en_US
dc.title	Using Machine Learning Techniques to Improve Static Code Analysis Tools Usefulness	en_US
dc.type	Thesis	en
thesis.degree.discipline	Computer & Information Science	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Enas_Alikhashashneh_thesis.pdf
Size:: 2.31 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.99 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Computer & Information Science Department Theses and Dissertations