Classifying early infant feeding status from clinical notes using natural language processing and machine learning

Lemas, Dominick J.; Du, Xinsong; Rouhizadeh, Masoud; Lewis, Braeden; Frank, Simon; Wright, Lauren; Spirache, Alex; Gonzalez, Lisa; Cheves, Ryan; Magalhães, Marina; Zapata, Ruben; Reddy, Rahul; Xu, Ke; Parker, Leslie; Harle, Chris; Young, Bridget; Louis‑Jaques, Adetola; Zhang, Bouri; Thompson, Lindsay; Hogan, William R.; Modave, François

Classifying early infant feeding status from clinical notes using natural language processing and machine learning

dc.contributor.author	Lemas, Dominick J.
dc.contributor.author	Du, Xinsong
dc.contributor.author	Rouhizadeh, Masoud
dc.contributor.author	Lewis, Braeden
dc.contributor.author	Frank, Simon
dc.contributor.author	Wright, Lauren
dc.contributor.author	Spirache, Alex
dc.contributor.author	Gonzalez, Lisa
dc.contributor.author	Cheves, Ryan
dc.contributor.author	Magalhães, Marina
dc.contributor.author	Zapata, Ruben
dc.contributor.author	Reddy, Rahul
dc.contributor.author	Xu, Ke
dc.contributor.author	Parker, Leslie
dc.contributor.author	Harle, Chris
dc.contributor.author	Young, Bridget
dc.contributor.author	Louis‑Jaques, Adetola
dc.contributor.author	Zhang, Bouri
dc.contributor.author	Thompson, Lindsay
dc.contributor.author	Hogan, William R.
dc.contributor.author	Modave, François
dc.contributor.department	Health Policy and Management, Richard M. Fairbanks School of Public Health
dc.date.accessioned	2024-07-11T18:25:14Z
dc.date.available	2024-07-11T18:25:14Z
dc.date.issued	2024-04-03
dc.description.abstract	The objective of this study is to develop and evaluate natural language processing (NLP) and machine learning models to predict infant feeding status from clinical notes in the Epic electronic health records system. The primary outcome was the classification of infant feeding status from clinical notes using Medical Subject Headings (MeSH) terms. Annotation of notes was completed using TeamTat to uniquely classify clinical notes according to infant feeding status. We trained 6 machine learning models to classify infant feeding status: logistic regression, random forest, XGBoost gradient descent, k-nearest neighbors, and support-vector classifier. Model comparison was evaluated based on overall accuracy, precision, recall, and F1 score. Our modeling corpus included an even number of clinical notes that was a balanced sample across each class. We manually reviewed 999 notes that represented 746 mother-infant dyads with a mean gestational age of 38.9 weeks and a mean maternal age of 26.6 years. The most frequent feeding status classification present for this study was exclusive breastfeeding [n = 183 (18.3%)], followed by exclusive formula bottle feeding [n = 146 (14.6%)], and exclusive feeding of expressed mother’s milk [n = 102 (10.2%)], with mixed feeding being the least frequent [n = 23 (2.3%)]. Our final analysis evaluated the classification of clinical notes as breast, formula/bottle, and missing. The machine learning models were trained on these three classes after performing balancing and down sampling. The XGBoost model outperformed all others by achieving an accuracy of 90.1%, a macro-averaged precision of 90.3%, a macro-averaged recall of 90.1%, and a macro-averaged F1 score of 90.1%. Our results demonstrate that natural language processing can be applied to clinical notes stored in the electronic health records to classify infant feeding status. Early identification of breastfeeding status using NLP on unstructured electronic health records data can be used to inform precision public health interventions focused on improving lactation support for postpartum patients.
dc.eprint.version	Final published version
dc.identifier.citation	Lemas DJ, Du X, Rouhizadeh M, et al. Classifying early infant feeding status from clinical notes using natural language processing and machine learning. Sci Rep. 2024;14(1):7831. Published 2024 Apr 3. doi:10.1038/s41598-024-58299-x
dc.identifier.uri	https://hdl.handle.net/1805/42137
dc.language.iso	en_US
dc.publisher	Springer Nature
dc.relation.isversionof	10.1038/s41598-024-58299-x
dc.relation.journal	Scientific Reports
dc.rights	Attribution 4.0 International	en
dc.rights.uri	https://creativecommons.org/licenses/by/4.0
dc.source	PMC
dc.subject	Computational biology and bioinformatics
dc.subject	Computational models
dc.subject	Machine learning
dc.title	Classifying early infant feeding status from clinical notes using natural language processing and machine learning
dc.type	Article

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Lemas2024Classifying-CCBY.pdf
Size:: 937.11 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.04 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Open Access Policy Articles
Department of Health Policy and Management Works