- Browse by Subject
Browsing by Subject "Accuracy"
Now showing 1 - 10 of 18
Results Per Page
Sort Options
Item Accuracy of a Commercial Large Language Model (ChatGPT) to Perform Disaster Triage of Simulated Patients Using the Simple Triage and Rapid Treatment (START) Protocol: Gage Repeatability and Reproducibility Study(JMIR, 2024-09-30) Franc, Jeffrey Micheal; Hertelendy, Attila Julius; Cheng, Lenard; Hata, Ryan; Verde, Manuela; Emergency Medicine, School of MedicineBackground: The release of ChatGPT (OpenAI) in November 2022 drastically reduced the barrier to using artificial intelligence by allowing a simple web-based text interface to a large language model (LLM). One use case where ChatGPT could be useful is in triaging patients at the site of a disaster using the Simple Triage and Rapid Treatment (START) protocol. However, LLMs experience several common errors including hallucinations (also called confabulations) and prompt dependency. Objective: This study addresses the research problem: "Can ChatGPT adequately triage simulated disaster patients using the START protocol?" by measuring three outcomes: repeatability, reproducibility, and accuracy. Methods: Nine prompts were developed by 5 disaster medicine physicians. A Python script queried ChatGPT Version 4 for each prompt combined with 391 validated simulated patient vignettes. Ten repetitions of each combination were performed for a total of 35,190 simulated triages. A reference standard START triage code for each simulated case was assigned by 2 disaster medicine specialists (JMF and MV), with a third specialist (LC) added if the first two did not agree. Results were evaluated using a gage repeatability and reproducibility study (gage R and R). Repeatability was defined as variation due to repeated use of the same prompt. Reproducibility was defined as variation due to the use of different prompts on the same patient vignette. Accuracy was defined as agreement with the reference standard. Results: Although 35,102 (99.7%) queries returned a valid START score, there was considerable variability. Repeatability (use of the same prompt repeatedly) was 14% of the overall variation. Reproducibility (use of different prompts) was 4.1% of the overall variation. The accuracy of ChatGPT for START was 63.9% with a 32.9% overtriage rate and a 3.1% undertriage rate. Accuracy varied by prompt with a maximum of 71.8% and a minimum of 46.7%. Conclusions: This study indicates that ChatGPT version 4 is insufficient to triage simulated disaster patients via the START protocol. It demonstrated suboptimal repeatability and reproducibility. The overall accuracy of triage was only 63.9%. Health care professionals are advised to exercise caution while using commercial LLMs for vital medical determinations, given that these tools may commonly produce inaccurate data, colloquially referred to as hallucinations or confabulations. Artificial intelligence-guided tools should undergo rigorous statistical evaluation-using methods such as gage R and R-before implementation into clinical settings.Item Accuracy of Orthodontic Soft Tissue Prediction Software between Different Ethnicities(2019) Stewart, Kelton; Patel, Pranali; Eckert, George; Rigsbee III, OH; Hughes, Jay; Utreja, AchintObjective: The objective of this study was to assess the accuracy of the soft tissue prediction module of Dolphin Imaging Software (DIS) in patients requiring extractions as part of the orthodontic treatment plan and compare its accuracy between different ethnicities. Materials and Methods: Initial and final records of 57 patients from three ethnic groups (African Americans, Caucasians, and Hispanics) who completed orthodontic treatment were included for assessment. The identified cases were managed non-surgically with dental extractions. A predictive profile was generated using DIS and compared to post-treatment lateral photographs. Actual and predictive profile photographs were compared using five designated parameters. The assessment parameters were evaluated using a manual protractor. ANOVA was used to compare differences between actual and predicted parameters between the specified groups and ICC was used to assess correlations between the data. Results: Neither ethnicity nor gender had a significant effect on the difference between predicted and final values. No significant difference was noted between the predicted and final images for the nasolabial angle. Significant differences were observed for the mentolabial fold, upper lip to E-line, and lower lip to E-line between predicted and actual images. Additionally, soft tissue convexity was significantly different (p=0.019). Additionally, a clinically significant difference was found for the mentolabial fold. Conclusion: Ethnicity and gender had no impact on the accuracy of predicted and actual image parameters. Overall, DIS demonstrated acceptable accuracy when simulating soft tissue changes after extraction therapy. Additional research on the accuracy of the software is warranted.Item Blackford County Horizontal Accuracy Report(2006-01-11T16:31:40Z)Report and table verifying the accuracy of the 2005 digital aerial photography (orthophotography) for Blackford County, IndianaItem Dearborn County Horizontal Accuracy Report(2006-01-11T04:46:04Z)Report and table verifying the accuracy of the 2005 digital aerial photography (orthophotography) for Dearborn County, IndianaItem Decatur County Horizontal Accuracy Report(2006-01-11T04:54:49Z)Report and table verifying the accuracy of the 2005 digital aerial photography (orthophotography) for Decatur County, IndianaItem Fayette County Horizontal Accuracy Report(2006-01-11T04:51:31Z)Report and table verifying the accuracy of the 2005 digital aerial photography (orthophotography) for Fayette County, IndianaItem Franklin County Horizontal Accuracy Report(2006-01-11T04:50:52Z)Report and table verifying the accuracy of the 2005 digital aerial photography (orthophotography) for Franklin County, IndianaItem Jennings County Horizontal Accuracy Report(2006-01-11T04:52:11Z)Report and table verifying the accuracy of the 2005 digital aerial photography (orthophotography) for Jennings County, IndianaItem Ohio County Horizontal Accuracy Report(2006-01-11T04:48:31Z)Report and table verifying the accuracy of the 2005 digital aerial photography (orthophotography) for Ohio County, IndianaItem Precision and accuracy of hyperglycemic clamps in a multicenter study(American Physiological Society, 2021) Mather, Kieren J.; Tjaden, Ashley H.; Hoehn, Adam; Nadeau, Kristen J.; Buchanan, Thomas A.; Kahn, Steven E.; Arslanian, Silva A.; Caprio, Sonia; Atkinson, Karen M.; Cree-Green, Melanie; Utzschneider, Kristina M.; Edelstein, Sharon L.; RISE Consortium; Medicine, School of MedicineApplication of glucose clamp methodologies in multicenter studies brings challenges for standardization. The Restoring Insulin Secretion (RISE) Consortium implemented a hyperglycemic clamp protocol across seven centers using a combination of technical and management approaches to achieve standardization. Two-stage hyperglycemic clamps with glucose targets of 200 mg/dL and >450 mg/dL were performed utilizing a centralized spreadsheet-based algorithm that guided dextrose infusion rates using bedside plasma glucose measurements. Clamp operators received initial and repeated training with ongoing feedback based on surveillance of clamp performance. The precision and accuracy of the achieved stage-specific glucose targets were evaluated, including differences by study center. We also evaluated robustness of the method to baseline physiologic differences and on-study treatment effects. The RISE approach produced high overall precision (3%–9% variance in achieved plasma glucose from target at various times across the procedure) and accuracy (SD < 10% overall). Statistically significant but numerically small differences in achieved target glucose concentrations were observed across study centers, within the magnitude of the observed technical variability. Variation of the achieved target glucose over time in placebo-treated individuals was low (<3% variation), and the method was robust to differences in baseline physiology (youth vs. adult, IGT vs. diabetes status) and differences in physiology induced by study treatments. The RISE approach to standardization of the hyperglycemic clamp methodology across multiple study centers produced technically excellent standardization of achieved glucose concentrations. This approach provides a reliable method for implementing glucose clamp methodology across multiple study centers. NEW & NOTEWORTHY: The Restoring Insulin Secretion (RISE) study centers undertook hyperglycemic clamps using a simplified methodology and a decision guidance algorithm implemented in an easy-to-use spreadsheet. This approach, combined with active management including ongoing central data surveillance and routine feedback to study centers, produced technically excellent standardization of achieved glucose concentrations on repeat studies within and across study centers.