Using Machine Learning Techniques to Improve Static Code Analysis Tools Usefulness

dc.contributor.advisorHill, James H.
dc.contributor.authorAlikhashashneh, Enas A.
dc.contributor.otherAl Hasan, Mohammad
dc.contributor.otherRaje, Rajeev R.
dc.contributor.otherTuceryan, Mihran
dc.date.accessioned2019-07-25T13:32:07Z
dc.date.available2019-07-25T13:32:07Z
dc.date.issued2019-08
dc.degree.date2019en_US
dc.degree.grantorPurdue Universityen_US
dc.degree.levelPh.D.en_US
dc.descriptionIndiana University-Purdue University Indianapolis (IUPUI)en_US
dc.description.abstractThis dissertation proposes an approach to reduce the cost of manual inspections for as large a number of false positive warnings that are being reported by Static Code Analysis (SCA) tools as much as possible using Machine Learning (ML) techniques. The proposed approach neither assume to use the particular SCA tools nor depends on the specific programming language used to write the target source code or the application. To reduce the number of false positive warnings we first evaluated a number of SCA tools in terms of software engineering metrics using a highlighted synthetic source code named the Juliet test suite. From this evaluation, we concluded that the SCA tools report plenty of false positive warnings that need a manual inspection. Then we generated a number of datasets from the source code that forced the SCA tool to generate either true positive, false positive, or false negative warnings. The datasets, then, were used to train four of ML classifiers in order to classify the collected warnings from the synthetic source code. From the experimental results of the ML classifiers, we observed that the classifier that built using the Random Forests (RF) technique outperformed the rest of the classifiers. Lastly, using this classifier and an instance-based transfer learning technique, we ranked a number of warnings that were aggregated from various open-source software projects. The experimental results show that the proposed approach to reduce the cost of the manual inspection of the false positive warnings outperformed the random ranking algorithm and was highly correlated with the ranked list that the optimal ranking algorithm generated.en_US
dc.identifier.urihttps://hdl.handle.net/1805/19942
dc.identifier.urihttp://dx.doi.org/10.7912/C2/2369
dc.language.isoenen_US
dc.rightsAttribution 3.0 United States*
dc.rights.urihttps://creativecommons.org/licenses/by/3.0/us*
dc.subjectStatic code analysisen_US
dc.subjectSource code metricsen_US
dc.subjectMachine learningen_US
dc.subjectFalse positivesen_US
dc.subjectReductionen_US
dc.subjectInstance-based transfer learning techniqueen_US
dc.titleUsing Machine Learning Techniques to Improve Static Code Analysis Tools Usefulnessen_US
dc.typeThesisen
thesis.degree.disciplineComputer & Information Scienceen
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Enas_Alikhashashneh_thesis.pdf
Size:
2.31 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.99 KB
Format:
Item-specific license agreed upon to submission
Description: