Derivation and Validation of an Algorithm for Maternal-Child Linkage in Electronic Health Records
Date
Language
Embargo Lift Date
Department
Committee Members
Degree
Degree Year
Department
Grantor
Journal Title
Journal ISSN
Volume Title
Found At
Abstract
Introduction: We created a probabilistic maternal-child electronic health record (EHR) linkage algorithm to promote clinical research in maternal-child health.
Methods: We used EHR data from 1994 to 2024 to create an XGBoost model to predict maternal-child linkages. The model used standard EHR elements as predictor variables, including first name, last name, birthdate, address, phone number, email, and an EHR-embedded maternal-child indicator as the deterministic outcome.
Results: From 82 million unique records, 6.2 billion potential pairs met blocking criteria. Of the potential pairs, 33 364 674 contained the deterministic indicator and were used as cases, and an equal number of controls were randomly sampled. The final model obtained an accuracy of 92%, a precision of 98%, a recall of 87%, and an F1-score of 92%.
Conclusion: We derived and validated a probabilistic maternal-child linkage algorithm using routinely collected EHR data elements that could benefit future observational research in maternal-child health.