Natural language processing-driven state machines to extract social factors from unstructured clinical documentation

dc.contributor.authorAllen, Katie S.
dc.contributor.authorHood, Dan R.
dc.contributor.authorCummins, Jonathan
dc.contributor.authorKasturi, Suranga
dc.contributor.authorMendonca, Eneida A.
dc.contributor.authorVest, Joshua R.
dc.contributor.departmentHealth Policy and Management, School of Public Health
dc.date.accessioned2023-11-29T11:56:11Z
dc.date.available2023-11-29T11:56:11Z
dc.date.issued2023-04-18
dc.description.abstractObjective: This study sought to create natural language processing algorithms to extract the presence of social factors from clinical text in 3 areas: (1) housing, (2) financial, and (3) unemployment. For generalizability, finalized models were validated on data from a separate health system for generalizability. Materials and methods: Notes from 2 healthcare systems, representing a variety of note types, were utilized. To train models, the study utilized n-grams to identify keywords and implemented natural language processing (NLP) state machines across all note types. Manual review was conducted to determine performance. Sampling was based on a set percentage of notes, based on the prevalence of social need. Models were optimized over multiple training and evaluation cycles. Performance metrics were calculated using positive predictive value (PPV), negative predictive value, sensitivity, and specificity. Results: PPV for housing rose from 0.71 to 0.95 over 3 training runs. PPV for financial rose from 0.83 to 0.89 over 2 training iterations, while PPV for unemployment rose from 0.78 to 0.88 over 3 iterations. The test data resulted in PPVs of 0.94, 0.97, and 0.95 for housing, financial, and unemployment, respectively. Final specificity scores were 0.95, 0.97, and 0.95 for housing, financial, and unemployment, respectively. Discussion: We developed 3 rule-based NLP algorithms, trained across health systems. While this is a less sophisticated approach, the algorithms demonstrated a high degree of generalizability, maintaining >0.85 across all predictive performance metrics. Conclusion: The rule-based NLP algorithms demonstrated consistent performance in identifying 3 social factors within clinical text. These methods may be a part of a strategy to measure social factors within an institution.
dc.eprint.versionFinal published version
dc.identifier.citationAllen KS, Hood DR, Cummins J, Kasturi S, Mendonca EA, Vest JR. Natural language processing-driven state machines to extract social factors from unstructured clinical documentation. JAMIA Open. 2023;6(2):ooad024. Published 2023 Apr 18. doi:10.1093/jamiaopen/ooad024
dc.identifier.urihttps://hdl.handle.net/1805/37204
dc.language.isoen_US
dc.publisherOxford University Press
dc.relation.isversionof10.1093/jamiaopen/ooad024
dc.relation.journalJAMIA Open
dc.rightsAttribution-NonCommercial 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/
dc.sourcePMC
dc.subjectClinical data
dc.subjectNatural language processing
dc.subjectSocial determinants of health
dc.subjectSocial factors
dc.titleNatural language processing-driven state machines to extract social factors from unstructured clinical documentation
dc.typeArticle
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ooad024.pdf
Size:
410.07 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.99 KB
Format:
Item-specific license agreed upon to submission
Description: