Classifying early infant feeding status from clinical notes using natural language processing and machine learning

dc.contributor.authorLemas, Dominick J.
dc.contributor.authorDu, Xinsong
dc.contributor.authorRouhizadeh, Masoud
dc.contributor.authorLewis, Braeden
dc.contributor.authorFrank, Simon
dc.contributor.authorWright, Lauren
dc.contributor.authorSpirache, Alex
dc.contributor.authorGonzalez, Lisa
dc.contributor.authorCheves, Ryan
dc.contributor.authorMagalhães, Marina
dc.contributor.authorZapata, Ruben
dc.contributor.authorReddy, Rahul
dc.contributor.authorXu, Ke
dc.contributor.authorParker, Leslie
dc.contributor.authorHarle, Chris
dc.contributor.authorYoung, Bridget
dc.contributor.authorLouis‑Jaques, Adetola
dc.contributor.authorZhang, Bouri
dc.contributor.authorThompson, Lindsay
dc.contributor.authorHogan, William R.
dc.contributor.authorModave, François
dc.contributor.departmentHealth Policy and Management, School of Public Health
dc.date.accessioned2024-07-11T18:25:14Z
dc.date.available2024-07-11T18:25:14Z
dc.date.issued2024-04-03
dc.description.abstractThe objective of this study is to develop and evaluate natural language processing (NLP) and machine learning models to predict infant feeding status from clinical notes in the Epic electronic health records system. The primary outcome was the classification of infant feeding status from clinical notes using Medical Subject Headings (MeSH) terms. Annotation of notes was completed using TeamTat to uniquely classify clinical notes according to infant feeding status. We trained 6 machine learning models to classify infant feeding status: logistic regression, random forest, XGBoost gradient descent, k-nearest neighbors, and support-vector classifier. Model comparison was evaluated based on overall accuracy, precision, recall, and F1 score. Our modeling corpus included an even number of clinical notes that was a balanced sample across each class. We manually reviewed 999 notes that represented 746 mother-infant dyads with a mean gestational age of 38.9 weeks and a mean maternal age of 26.6 years. The most frequent feeding status classification present for this study was exclusive breastfeeding [n = 183 (18.3%)], followed by exclusive formula bottle feeding [n = 146 (14.6%)], and exclusive feeding of expressed mother’s milk [n = 102 (10.2%)], with mixed feeding being the least frequent [n = 23 (2.3%)]. Our final analysis evaluated the classification of clinical notes as breast, formula/bottle, and missing. The machine learning models were trained on these three classes after performing balancing and down sampling. The XGBoost model outperformed all others by achieving an accuracy of 90.1%, a macro-averaged precision of 90.3%, a macro-averaged recall of 90.1%, and a macro-averaged F1 score of 90.1%. Our results demonstrate that natural language processing can be applied to clinical notes stored in the electronic health records to classify infant feeding status. Early identification of breastfeeding status using NLP on unstructured electronic health records data can be used to inform precision public health interventions focused on improving lactation support for postpartum patients.
dc.eprint.versionFinal published version
dc.identifier.citationLemas DJ, Du X, Rouhizadeh M, et al. Classifying early infant feeding status from clinical notes using natural language processing and machine learning. Sci Rep. 2024;14(1):7831. Published 2024 Apr 3. doi:10.1038/s41598-024-58299-x
dc.identifier.urihttps://hdl.handle.net/1805/42137
dc.language.isoen_US
dc.publisherSpringer Nature
dc.relation.isversionof10.1038/s41598-024-58299-x
dc.relation.journalScientific Reports
dc.rightsAttribution 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.sourcePMC
dc.subjectComputational biology and bioinformatics
dc.subjectComputational models
dc.subjectMachine learning
dc.titleClassifying early infant feeding status from clinical notes using natural language processing and machine learning
dc.typeArticle
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Lemas2024Classifying-CCBY.pdf
Size:
937.11 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.04 KB
Format:
Item-specific license agreed upon to submission
Description: