An Assessment of ChatGPT’s Performance as a Patient Counseling Tool: Exploring the Potential Integration of Large Language Model-based ChatBots into Online Patient Portals

Price, CharlesBrougham, AlbertBurton, KyleDexter, Paul2024-04-232024-04-232024-04-26Price CG, Brougham AJ, Burton KA, Dexter PR. An Assessment of ChatGPT’s Performance as a Patient Counseling Tool: Exploring the Potential Integration of Large Language Model-based ChatBots into Online Patient Portals. Poster presented at: Indiana University School of Medicine Education Day; April 26, 2024: Indianapolis, IN.https://hdl.handle.net/1805/40156BACKGROUND: With the advancement of online patient portals, patients now have unprecedented access to their healthcare providers. This has led to increased physician burden associated with electronic inbox overload [1]. Recent developments in artificial intelligence, specifically in Large Language Model-based chatbots (i.e. ChatGPT), may prove to be useful tools in reducing such burden. Can ChatGPT reliably be utilized as a patient counseling tool? ChatGPT can be described as “an advanced language model that uses deep learning techniques to produce human-like responses to natural language inputs” [5]. Despite concerns surrounding this technology (i.e. spreading of misinformation, inconsistent reproducibility, “hallucination” phenomena), several studies have demonstrated ChatGPT’s clinical savviness. One study examined ChatGPT’s ability to answer frequently asked fertility-related questions, finding the model’s responses to be comparable to the CDC’s published answers in respect to length, factual content, and sentiment [6]. Additionally, ChatGPT was found capable of achieving a passing score on the STEP 1 licensing exam, a benchmark set for third year medical students [7]. OBJECTIVE: This study aims to further evaluate the clinical decision making of ChatGPT, specifically the ability for ChatGPT to provide accurate medical counseling in response to frequently asked patient questions within the field of cardiology. METHODS: 35 frequently asked cardiovascular questions (FAQs) published by the OHSU Knight Cardiovascular Institute were processed through ChatGPT 4 (Classic Version) by OpenAI. ChatGPT’s answers and the provided answers by the OHSU Knight Cardiovascular Institute were assessed in respect to length, factual content, sentiment analysis, and the presence of incorrect/false statements. RESULTS: When comparing ChatGPT’s responses to the 35 FAQs against the published responses by OHSU, Chat GPT’s responses were significantly longer in length (295.4 vs 112.5 (words/response)) and included more factual statements per response (7.2 vs 3.5). Chat GPT was able to produce responses of similar sentiment polarity (0.10 vs 0.11 on a scale of -1 (negative) to 1 (positive)) and subjectivity (0.46 vs 0.43 on a scale from 0 (objective) to 1 (subjective)). 0% of ChatGPT’s factual statements were found to be false or harmful. CONCLUSIONS: The results of this study provide valuable insight into the clinical “knowledge” and fluency of ChatGPT, demonstrating its ability to produce accurate and effective responses to frequently asked cardiovascular questions. Larger scale studies with an additional focus on ChatGPT’s reproducibility/consistency may provide important implications for the future of patient education. Implementation of AI-based chatbots into online patient portals may prove to be assistive to physicians, alleviating the growing burden of electronic inbox volume.en-USAn Assessment of ChatGPT’s Performance as a Patient Counseling Tool: Exploring the Potential Integration of Large Language Model-based ChatBots into Online Patient PortalsPoster