- Browse by Author
Browsing by Author "Wawira Gichoya, Judy"
Now showing 1 - 6 of 6
Results Per Page
Sort Options
Item A DICOM Framework for Machine Learning and Processing Pipelines Against Real-time Radiology Images(SpringerLink, 2021-08-17) Kathiravelu, Pradeeban; Sharma, Puneet; Sharma, Ashish; Banerjee, Imon; Trivedi, Hari; Purkayastha, Saptarshi; Sinha, Priyanshu; Cadrin‑Chenevert, Alexandre; Safdar, Nabile; Wawira Gichoya, Judy; BioHealth Informatics, School of Informatics and ComputingReal-time execution of machine learning (ML) pipelines on radiology images is difficult due to limited computing resources in clinical environments, whereas running them in research clusters requires efficient data transfer capabilities. We developed Niffler, an open-source Digital Imaging and Communications in Medicine (DICOM) framework that enables ML and processing pipelines in research clusters by efficiently retrieving images from the hospitals’ PACS and extracting the metadata from the images. We deployed Niffler at our institution (Emory Healthcare, the largest healthcare network in the state of Georgia) and retrieved data from 715 scanners spanning 12 sites, up to 350 GB/day continuously in real-time as a DICOM data stream over the past 2 years. We also used Niffler to retrieve images bulk on-demand based on user-provided filters to facilitate several research projects. This paper presents the architecture and three such use cases of Niffler. First, we executed an IVC filter detection and segmentation pipeline on abdominal radiographs in real-time, which was able to classify 989 test images with an accuracy of 96.0%. Second, we applied the Niffler Metadata Extractor to understand the operational efficiency of individual MRI systems based on calculated metrics. We benchmarked the accuracy of the calculated exam time windows by comparing Niffler against the Clinical Data Warehouse (CDW). Niffler accurately identified the scanners’ examination timeframes and idling times, whereas CDW falsely depicted several exam overlaps due to human errors. Third, with metadata extracted from the images by Niffler, we identified scanners with misconfigured time and reconfigured five scanners. Our evaluations highlight how Niffler enables real-time ML and processing pipelines in a research cluster.Item Evaluation of federated learning variations for COVID-19 diagnosis using chest radiographs from 42 US and European hospitals(Oxford University Press, 2022) Peng, Le; Luo, Gaoxiang; Walker, Andrew; Zaiman, Zachary; Jones, Emma K.; Gupta, Hemant; Kersten, Kristopher; Burns, John L.; Harle, Christopher A.; Magoc, Tanja; Shickel, Benjamin; Steenburg, Scott D.; Loftus, Tyler; Melton, Genevieve B.; Wawira Gichoya, Judy; Sun, Ju; Tignanelli, Christopher J.; Radiology and Imaging Sciences, School of MedicineObjective: Federated learning (FL) allows multiple distributed data holders to collaboratively learn a shared model without data sharing. However, individual health system data are heterogeneous. "Personalized" FL variations have been developed to counter data heterogeneity, but few have been evaluated using real-world healthcare data. The purpose of this study is to investigate the performance of a single-site versus a 3-client federated model using a previously described Coronavirus Disease 19 (COVID-19) diagnostic model. Additionally, to investigate the effect of system heterogeneity, we evaluate the performance of 4 FL variations. Materials and methods: We leverage a FL healthcare collaborative including data from 5 international healthcare systems (US and Europe) encompassing 42 hospitals. We implemented a COVID-19 computer vision diagnosis system using the Federated Averaging (FedAvg) algorithm implemented on Clara Train SDK 4.0. To study the effect of data heterogeneity, training data was pooled from 3 systems locally and federation was simulated. We compared a centralized/pooled model, versus FedAvg, and 3 personalized FL variations (FedProx, FedBN, and FedAMP). Results: We observed comparable model performance with respect to internal validation (local model: AUROC 0.94 vs FedAvg: 0.95, P = .5) and improved model generalizability with the FedAvg model (P < .05). When investigating the effects of model heterogeneity, we observed poor performance with FedAvg on internal validation as compared to personalized FL algorithms. FedAvg did have improved generalizability compared to personalized FL algorithms. On average, FedBN had the best rank performance on internal and external validation. Conclusion: FedAvg can significantly improve the generalization of the model compared to other personalization FL algorithms; however, at the cost of poor internal validity. Personalized FL may offer an opportunity to develop both internal and externally validated algorithms.Item MedShift: Automated Identification of Shift Data for Medical Image Dataset Curation(IEEE, 2023) Guo, Xiaoyuan; Wawira Gichoya, Judy; Trivedi, Hari; Purkayastha, Saptarshi; Banerjee, Imon; Biomedical Engineering and Informatics, Luddy School of Informatics, Computing, and EngineeringAutomated curation of noisy external data in the medical domain has long been demanding as AI technologies should be validated on various sources with clean annotated data. To curate a high-quality dataset, identifying variance between the internal and external sources is a fundamental step as the data distributions from different sources can vary significantly and subsequently affect the performance of the AI models. Primary challenges for detecting data shifts are – (1) access to private data across healthcare institutions for manual detection, and (2) the lack of automated approaches to learn efficient shift-data representation without training samples. To overcome the problems, we propose an automated pipeline called MedShift to detect the top-level shift samples and evaluating the significance of shift data without sharing data between the internal and external organizations. MedShift employs unsupervised anomaly detectors to learn the internal distribution and identify samples showing significant shiftness for external datasets, and compared their performance. To quantify the effects of detected shift data, we train a multi-class classifier that learns internal domain knowledge and evaluating the classification performance for each class in external domains after dropping the shift data. We also propose a data quality metric to quantify the dissimilarity between the internal and external datasets. We verify the efficacy of MedShift with musculoskeletal radiographs (MURA) and chest X-rays datasets from more than one external source. Experiments show our proposed shift data detection pipeline can be beneficial for medical centers to curate high-quality datasets more efficiently. The code can be found at https://github.com/XiaoyuanGuo/MedShift. An interface introduction video to visualize our results is available at https://youtu.be/V3BF0P1sxQE.Item Patient-specific COVID-19 resource utilization prediction using fusion AI model(Springer Nature, 2021-06-03) Tariq, Amara; Celi, Leo Anthony; Newsome, Janice M.; Purkayastha, Saptarshi; Bhatia, Neal Kumar; Trivedi, Hari; Wawira Gichoya, Judy; Banerjee, Imon; BioHealth Informatics, School of Informatics and ComputingThe strain on healthcare resources brought forth by the recent COVID-19 pandemic has highlighted the need for efficient resource planning and allocation through the prediction of future consumption. Machine learning can predict resource utilization such as the need for hospitalization based on past medical data stored in electronic medical records (EMR). We conducted this study on 3194 patients (46% male with mean age 56.7 (±16.8), 56% African American, 7% Hispanic) flagged as COVID-19 positive cases in 12 centers under Emory Healthcare network from February 2020 to September 2020, to assess whether a COVID-19 positive patient’s need for hospitalization can be predicted at the time of RT-PCR test using the EMR data prior to the test. Five main modalities of EMR, i.e., demographics, medication, past medical procedures, comorbidities, and laboratory results, were used as features for predictive modeling, both individually and fused together using late, middle, and early fusion. Models were evaluated in terms of precision, recall, F1-score (within 95% confidence interval). The early fusion model is the most effective predictor with 84% overall F1-score [CI 82.1–86.1]. The predictive performance of the model drops by 6 % when using recent clinical data while omitting the long-term medical history. Feature importance analysis indicates that history of cardiovascular disease, emergency room visits in the past year prior to testing, and demographic factors are predictive of the disease trajectory. We conclude that fusion modeling using medical history and current treatment data can forecast the need for hospitalization for patients infected with COVID-19 at the time of the RT-PCR test.Item Performance of a Chest Radiograph AI Diagnostic Tool for COVID-19: A Prospective Observational Study(Radiological Society of North America, 2022-06-01) Sun, Ju; Peng, Le; Li, Taihui; Adila, Dyah; Zaiman, Zach; Melton-Meaux, Genevieve B.; Ingraham, Nicholas E.; Murray, Eric; Boley, Daniel; Switzer, Sean; Burns, John L.; Huang, Kun; Allen, Tadashi; Steenburg, Scott D.; Wawira Gichoya, Judy; Kummerfeld, Erich; Tignanelli, Christopher J.; Radiology and Imaging Sciences, School of MedicinePurpose: To conduct a prospective observational study across 12 U.S. hospitals to evaluate real-time performance of an interpretable artificial intelligence (AI) model to detect COVID-19 on chest radiographs. Materials and methods: A total of 95 363 chest radiographs were included in model training, external validation, and real-time validation. The model was deployed as a clinical decision support system, and performance was prospectively evaluated. There were 5335 total real-time predictions and a COVID-19 prevalence of 4.8% (258 of 5335). Model performance was assessed with use of receiver operating characteristic analysis, precision-recall curves, and F1 score. Logistic regression was used to evaluate the association of race and sex with AI model diagnostic accuracy. To compare model accuracy with the performance of board-certified radiologists, a third dataset of 1638 images was read independently by two radiologists. Results: Participants positive for COVID-19 had higher COVID-19 diagnostic scores than participants negative for COVID-19 (median, 0.1 [IQR, 0.0-0.8] vs 0.0 [IQR, 0.0-0.1], respectively; P < .001). Real-time model performance was unchanged over 19 weeks of implementation (area under the receiver operating characteristic curve, 0.70; 95% CI: 0.66, 0.73). Model sensitivity was higher in men than women (P = .01), whereas model specificity was higher in women (P = .001). Sensitivity was higher for Asian (P = .002) and Black (P = .046) participants compared with White participants. The COVID-19 AI diagnostic system had worse accuracy (63.5% correct) compared with radiologist predictions (radiologist 1 = 67.8% correct, radiologist 2 = 68.6% correct; McNemar P < .001 for both). Conclusion: AI-based tools have not yet reached full diagnostic potential for COVID-19 and underperform compared with radiologist prediction.Item Theory of radiologist interaction with instant messaging decision support tools: A sequential-explanatory study(Public Library of Science, 2024-02-26) Burns, John Lee; Wawira Gichoya, Judy; Kohli, Marc D.; Jones, Josette; Purkayastha, Saptarshi; Radiology and Imaging Sciences, School of MedicineRadiology specific clinical decision support systems (CDSS) and artificial intelligence are poorly integrated into the radiologist workflow. Current research and development efforts of radiology CDSS focus on 4 main interventions, based around exam centric time points-after image acquisition, intra-report support, post-report analysis, and radiology workflow adjacent. We review the literature surrounding CDSS tools in these time points, requirements for CDSS workflow augmentation, and technologies that support clinician to computer workflow augmentation. We develop a theory of radiologist-decision tool interaction using a sequential explanatory study design. The study consists of 2 phases, the first a quantitative survey and the second a qualitative interview study. The phase 1 survey identifies differences between average users and radiologist users in software interventions using the User Acceptance of Information Technology: Toward a Unified View (UTAUT) framework. Phase 2 semi-structured interviews provide narratives on why these differences are found. To build this theory, we propose a novel solution called Radibot-a conversational agent capable of engaging clinicians with CDSS as an assistant using existing instant messaging systems supporting hospital communications. This work contributes an understanding of how radiologist-users differ from the average user and can be utilized by software developers to increase satisfaction of CDSS tools within radiology.