Generalizability and portability of natural language processing system to extract individual social risk factors

Magoc, Tanja; Allen, Katie S.; McDonnell, Cara; Russo, Jean-Paul; Cummins, Jonathan; Vest, Joshua R.; Harle, Christopher A.

Generalizability and portability of natural language processing system to extract individual social risk factors

dc.contributor.author	Magoc, Tanja
dc.contributor.author	Allen, Katie S.
dc.contributor.author	McDonnell, Cara
dc.contributor.author	Russo, Jean-Paul
dc.contributor.author	Cummins, Jonathan
dc.contributor.author	Vest, Joshua R.
dc.contributor.author	Harle, Christopher A.
dc.contributor.department	Emergency Medicine, School of Medicine
dc.date.accessioned	2024-10-29T11:48:51Z
dc.date.available	2024-10-29T11:48:51Z
dc.date.issued	2023
dc.description.abstract	Objective: The objective of this study is to validate and report on portability and generalizability of a Natural Language Processing (NLP) method to extract individual social factors from clinical notes, which was originally developed at a different institution. Materials and methods: A rule-based deterministic state machine NLP model was developed to extract financial insecurity and housing instability using notes from one institution and was applied on all notes written during 6 months at another institution. 10% of positively-classified notes by NLP and the same number of negatively-classified notes were manually annotated. The NLP model was adjusted to accommodate notes at the new site. Accuracy, positive predictive value, sensitivity, and specificity were calculated. Results: More than 6 million notes were processed at the receiving site by the NLP model, which resulted in about 13,000 and 19,000 classified as positive for financial insecurity and housing instability, respectively. The NLP model showed excellent performance on the validation dataset with all measures over 0.87 for both social factors. Discussion: Our study illustrated the need to accommodate institution-specific note-writing templates as well as clinical terminology of emergent diseases when applying NLP model for social factors. A state machine is relatively simple to port effectively across institutions. Our study. showed superior performance to similar generalizability studies for extracting social factors. Conclusion: Rule-based NLP model to extract social factors from clinical notes showed strong portability and generalizability across organizationally and geographically distinct institutions. With only relatively simple modifications, we obtained promising performance from an NLP-based model.
dc.eprint.version	Author's manuscript
dc.identifier.citation	Magoc T, Allen KS, McDonnell C, et al. Generalizability and portability of natural language processing system to extract individual social risk factors. Int J Med Inform. 2023;177:105115. doi:10.1016/j.ijmedinf.2023.105115
dc.identifier.uri	https://hdl.handle.net/1805/44313
dc.language.iso	en_US
dc.publisher	Elsevier
dc.relation.isversionof	10.1016/j.ijmedinf.2023.105115
dc.relation.journal	International Journal of Medical Informatics
dc.rights	Publisher Policy
dc.source	PMC
dc.subject	Generalizability
dc.subject	Natural language processing
dc.subject	Portability
dc.subject	Rule-based
dc.subject	Social risk factors
dc.title	Generalizability and portability of natural language processing system to extract individual social risk factors
dc.type	Article

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Magoc2023Generalizability-AAM.pdf
Size:: 406.01 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.04 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Open Access Policy Articles
Department of Emergency Medicine Works