Enhancing an enterprise data warehouse for research with data extracted using natural language processing

dc.contributor.authorMagoc, Tanja
dc.contributor.authorEverson, Russell
dc.contributor.authorHarle, Christopher A.
dc.contributor.departmentHealth Policy and Management, School of Public Health
dc.date.accessioned2024-02-09T11:37:44Z
dc.date.available2024-02-09T11:37:44Z
dc.date.issued2023-06-13
dc.description.abstractObjective: This study aims to develop a generalizable architecture for enhancing an enterprise data warehouse for research (EDW4R) with results from a natural language processing (NLP) model, which allows discrete data derived from clinical notes to be made broadly available for research use without need for NLP expertise. The study also quantifies the additional value that information extracted from clinical narratives brings to EDW4R. Materials and methods: Clinical notes written during one month at an academic health center were used to evaluate the performance of an existing NLP model and to quantify its value added to the structured data. Manual review was utilized for performance analysis. The architecture for enhancing the EDW4R is described in detail to enable reproducibility. Results: Two weeks were needed to enhance EDW4R with data from 250 million clinical notes. NLP generated 16 and 39% increase in data availability for two variables. Discussion: Our architecture is highly generalizable to a new NLP model. The positive predictive value obtained by an independent team showed only slightly lower NLP performance than the values reported by the NLP developers. The NLP showed significant value added to data already available in structured format. Conclusion: Given the value added by data extracted using NLP, it is important to enhance EDW4R with these data to enable research teams without NLP expertise to benefit from value added by NLP models.
dc.eprint.versionFinal published version
dc.identifier.citationMagoc T, Everson R, Harle CA. Enhancing an enterprise data warehouse for research with data extracted using natural language processing. J Clin Transl Sci. 2023;7(1):e149. Published 2023 Jun 13. doi:10.1017/cts.2023.575
dc.identifier.urihttps://hdl.handle.net/1805/38352
dc.language.isoen_US
dc.publisherCambridge University Press
dc.relation.isversionof10.1017/cts.2023.575
dc.relation.journalJournal of Clinical and Translational Science
dc.rightsAttribution 4.0 Internationalen
dc.rights.urihttps://creativecommons.org/licenses/by/4.0
dc.sourcePMC
dc.subjectNatural language processing
dc.subjectEnterprise data warehouse for research
dc.subjectElectronic health records
dc.subjectData service
dc.subjectSmoking behavior
dc.subjectRule-based
dc.subjectETL
dc.titleEnhancing an enterprise data warehouse for research with data extracted using natural language processing
dc.typeArticle
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
S2059866123005757a.pdf
Size:
578.54 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.99 KB
Format:
Item-specific license agreed upon to submission
Description: