Predicting Childhood Obesity Using Machine Learning: Practical Considerations

Previous studies demonstrate the feasibility of predicting obesity using various machine learning techniques; however, these studies do not address the limitations of these methods in real-life settings where available data for children may vary. We investigated the medical history required for machine learning models to accurately predict body mass index (BMI) during early childhood. Within a longitudinal dataset of children ages 0–4 years, we developed predictive models based on long short-term memory (LSTM), a recurrent neural network architecture, using history EHR data from 2 to 8 clinical encounters to estimate child BMI. We developed separate, sex-stratified models using 80% of the data for training and 20% for external validation. We evaluated model performance using K-fold cross-validation, mean average error (MAE), and Pearson’s correlation coefficient (R2). Two history encounters and a 4-month prediction yielded a high prediction error and low correlation between predicted and actual BMI (MAE of 1.60 for girls and 1.49 for boys). Model performance improved with additional history encounters; improvement was not significant beyond five history encounters. The combined model outperformed the sex-stratified models, with a MAE = 0.98 (SD 0.03) and R2 = 0.72. Our models show that five history encounters are sufficient to predict BMI prior to age 4 for both boys and girls. Moreover, starting from an initial dataset with more than 269 exposure variables, we were able to identify a limited set of 24 variables that can facilitate BMI prediction in early childhood. Nine of these final variables are collected once, and the remaining 15 need to be updated during each visit.

Keywords

BMI, childhood obesity, machine learning, EHR

Cite As

Cheng, E. R., Steinhardt, R., & Ben Miled, Z. (2022). Predicting Childhood Obesity Using Machine Learning: Practical Considerations. BioMedInformatics, 2(1), Article 1. https://doi.org/10.3390/biomedinformatics2010012

ISSN

2673-7426

Journal

BioMedInformatics

Rights

Attribution 4.0 International

Source

Publisher

Type

Article

Permanent Link

https://hdl.handle.net/1805/30809

DOI

https://doi.org/10.3390/biomedinformatics2010012

Version

Final published version

Collections

Open Access Policy Articles
Department of Pediatrics Works

Full item page