A Novel Multimodal Deep Image Analysis Model for Predicting Extraction/Non‐Extraction Decision
Date
Language
Embargo Lift Date
Committee Members
Degree
Degree Year
Department
Grantor
Journal Title
Journal ISSN
Volume Title
Found At
Abstract
Objective: This study aimed to develop a deep learning model classifier capable of predicting the extraction/non-extraction binary decision using lateral cephalometric radiographs (LCRs) and intraoral scans (IOS) to serve as an additional decision-support tool for orthodontists.
Materials and methods: The dataset was composed of LCRs and IOS from 617 patients (mean age: 18.2, 63.5% female) treated at the Indiana University School of Dentistry. Subjects were categorised into two groups: extraction (192) and non-extraction (425). Two sets of features were extracted from IOS: traditional arch measurements and novel tooth spatial features. For LCRs, features were derived using CephNet-based landmark detection (Land), a convolutional autoencoder (AE), and the dimensionality was reduced using Principal Component Analysis (PCA). Models were evaluated using accuracy, sensitivity, specificity, positive predictive value (PPV or precision), negative predictive value (NPV), positive likelihood ratio (LR+), negative likelihood ratio (LR-), and F1 score.
Results: IOS + Land model achieved the highest overall accuracy (77%) and F1 score (0.62), with strong specificity (83%) and PPV (62%). In contrast, the Land model yielded the highest sensitivity (82%), but at the cost of lower specificity (57%). McNemar's test revealed that the AE model was significantly less accurate than IOS + AE (p = 0.048), IOS + Land (p = 0.006), and IOS + AE + Land (p = 0.005).
Conclusion: Deep learning models can predict the extraction/non-extraction decision using IOS and LCRs with high accuracy and diagnostic performance. Multimodal approaches, particularly those integrating IOS with cephalometric landmarks, demonstrate superior accuracy, sensitivity, and specificity compared to single-modality models.
