Feasibility of Automated Machine Learning Using Public Data for Use in Endoscopy
Date
Language
Embargo Lift Date
Department
Committee Members
Degree
Degree Year
Department
Grantor
Journal Title
Journal ISSN
Volume Title
Found At
Abstract
Introduction: With recent successful applications of computer vision in gastroenterology and endoscopy, there has been strong interest among physicians to develop practical skills in artificial intelligence. Automated Machine Learning (AutoML) platforms may increase access to complex deep learning algorithms that may otherwise be inaccessible and allow physicians to build complex models for a variety of use-cases simply by providing labeled data. We focused on three commonly used AutoML platforms created by Microsoft, Amazon, and Google that market their ability to create image classification and object detection models. Using labeled data from the publicly available SUN[1] colonoscopy data set, we developed computer aided diagnosis (CADx) and computer aided detection (CADe) models on all three AutoML platforms.
Methods: The dataset used to evaluate model performance is the SUN (Showa University and Nagoya University) Colonoscopy Video Database. To create the models, the data were uploaded to the respective platforms and the annotation files were parsed into a format readable by the platform. The dataset was split 70/10/20 for training, validation, and testing. We used metrics including sensitivity, specificity, PPV, NPV, F1, AuROC, accuracy, precision, and recall to evaluate the CADx models. CADe models were evaluated using precision, recall, and F1 score. We used analysis of variance (ANOVA) testing with an alpha of 0.05 to determine if the performance of each CADx model was different across platforms.
Results: The sensitivity of the three CADx models was 0.9996, 0.9801, and 0.9770 for Microsoft, Google, and Amazon respectively. The specificity was 0.9993, 0.9665, and 0.9633. There was a statistically significant difference in the performance of the three CADx models. The F1 scores of the models built using Microsoft, Google, and Amazon platforms were 0.9996, 0.9800, and 0.9768 respectively (P=0.0044). The F1 scores for the CADe models made by the Microsoft, Google, and Amazon platforms (using an IoU threshold of 0.5), were 0.9929, 0.9650, and 0.8980 respectively.
Conclusions: Using minimal coding, we were able to create three algorithms, which were all able to achieve high F1 accuracy scores (> 0.9) on CADe and CADx use-cases. There was a statistically significant difference in the F1 accuracy of the models created by the AutoML platforms. Further analysis on larger datasets and on different landmarks is needed to demonstrate if the Microsoft AutoML consistently performs best on all endoscopic computer vision tasks. AutoML platforms represent a practical entry point for endoscopists interested in exploring computer vision for GI endoscopy and may be an important catalyst for physician-driven innovation.