Performance of a Chest Radiograph AI Diagnostic Tool for COVID-19: A Prospective Observational Study

dc.contributor.authorSun, Ju
dc.contributor.authorPeng, Le
dc.contributor.authorLi, Taihui
dc.contributor.authorAdila, Dyah
dc.contributor.authorZaiman, Zach
dc.contributor.authorMelton-Meaux, Genevieve B.
dc.contributor.authorIngraham, Nicholas E.
dc.contributor.authorMurray, Eric
dc.contributor.authorBoley, Daniel
dc.contributor.authorSwitzer, Sean
dc.contributor.authorBurns, John L.
dc.contributor.authorHuang, Kun
dc.contributor.authorAllen, Tadashi
dc.contributor.authorSteenburg, Scott D.
dc.contributor.authorWawira Gichoya, Judy
dc.contributor.authorKummerfeld, Erich
dc.contributor.authorTignanelli, Christopher J.
dc.contributor.departmentRadiology and Imaging Sciences, School of Medicine
dc.date.accessioned2024-05-21T09:20:16Z
dc.date.available2024-05-21T09:20:16Z
dc.date.issued2022-06-01
dc.description.abstractPurpose: To conduct a prospective observational study across 12 U.S. hospitals to evaluate real-time performance of an interpretable artificial intelligence (AI) model to detect COVID-19 on chest radiographs. Materials and methods: A total of 95 363 chest radiographs were included in model training, external validation, and real-time validation. The model was deployed as a clinical decision support system, and performance was prospectively evaluated. There were 5335 total real-time predictions and a COVID-19 prevalence of 4.8% (258 of 5335). Model performance was assessed with use of receiver operating characteristic analysis, precision-recall curves, and F1 score. Logistic regression was used to evaluate the association of race and sex with AI model diagnostic accuracy. To compare model accuracy with the performance of board-certified radiologists, a third dataset of 1638 images was read independently by two radiologists. Results: Participants positive for COVID-19 had higher COVID-19 diagnostic scores than participants negative for COVID-19 (median, 0.1 [IQR, 0.0-0.8] vs 0.0 [IQR, 0.0-0.1], respectively; P < .001). Real-time model performance was unchanged over 19 weeks of implementation (area under the receiver operating characteristic curve, 0.70; 95% CI: 0.66, 0.73). Model sensitivity was higher in men than women (P = .01), whereas model specificity was higher in women (P = .001). Sensitivity was higher for Asian (P = .002) and Black (P = .046) participants compared with White participants. The COVID-19 AI diagnostic system had worse accuracy (63.5% correct) compared with radiologist predictions (radiologist 1 = 67.8% correct, radiologist 2 = 68.6% correct; McNemar P < .001 for both). Conclusion: AI-based tools have not yet reached full diagnostic potential for COVID-19 and underperform compared with radiologist prediction.
dc.eprint.versionFinal published version
dc.identifier.citationSun J, Peng L, Li T, et al. Performance of a Chest Radiograph AI Diagnostic Tool for COVID-19: A Prospective Observational Study. Radiol Artif Intell. 2022;4(4):e210217. Published 2022 Jun 1. doi:10.1148/ryai.210217
dc.identifier.urihttps://hdl.handle.net/1805/40862
dc.language.isoen_US
dc.publisherRadiological Society of North America
dc.relation.isversionof10.1148/ryai.210217
dc.relation.journalRadiology: Artificial Intelligence
dc.rightsPublisher Policy
dc.sourcePMC
dc.subjectDiagnosis
dc.subjectClassification
dc.subjectApplication Domain
dc.subjectInfection
dc.subjectLung
dc.titlePerformance of a Chest Radiograph AI Diagnostic Tool for COVID-19: A Prospective Observational Study
dc.typeArticle
ul.alternative.fulltexthttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC9344211/
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Sun2022Performance-PP.pdf
Size:
908.51 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.04 KB
Format:
Item-specific license agreed upon to submission
Description: