A machine learning-based phenotype for long COVID in children: an EHR-based study from the RECOVER program

dc.contributor.authorLorman, Vitaly
dc.contributor.authorRazzaghi, Hanieh
dc.contributor.authorSong, Xing
dc.contributor.authorMorse, Keith
dc.contributor.authorUtidjian, Levon
dc.contributor.authorAllen, Andrea J.
dc.contributor.authorRao, Suchitra
dc.contributor.authorRogerson, Colin
dc.contributor.authorBennett, Tellen D.
dc.contributor.authorMorizono, Hiroki
dc.contributor.authorEckrich, Daniel
dc.contributor.authorJhaveri, Ravi
dc.contributor.authorHuang, Yungui
dc.contributor.authorRanade, Daksha
dc.contributor.authorPajor, Nathan
dc.contributor.authorLee, Grace M.
dc.contributor.authorForrest, Christopher B.
dc.contributor.authorBailey, L. Charles
dc.contributor.departmentPediatrics, School of Medicine
dc.date.accessioned2023-10-19T15:54:45Z
dc.date.available2023-10-19T15:54:45Z
dc.date.issued2022-12-26
dc.description.abstractBackground: As clinical understanding of pediatric Post-Acute Sequelae of SARS CoV-2 (PASC) develops, and hence the clinical definition evolves, it is desirable to have a method to reliably identify patients who are likely to have post-acute sequelae of SARS CoV-2 (PASC) in health systems data. Methods and findings: In this study, we developed and validated a machine learning algorithm to classify which patients have PASC (distinguishing between Multisystem Inflammatory Syndrome in Children (MIS-C) and non-MIS-C variants) from a cohort of patients with positive SARS-CoV-2 test results in pediatric health systems within the PEDSnet EHR network. Patient features included in the model were selected from conditions, procedures, performance of diagnostic testing, and medications using a tree-based scan statistic approach. We used an XGboost model, with hyperparameters selected through cross-validated grid search, and model performance was assessed using 5-fold cross-validation. Model predictions and feature importance were evaluated using Shapley Additive exPlanation (SHAP) values. Conclusions: The model provides a tool for identifying patients with PASC and an approach to characterizing PASC using diagnosis, medication, laboratory, and procedure features in health systems data. Using appropriate threshold settings, the model can be used to identify PASC patients in health systems data at higher precision for inclusion in studies or at higher recall in screening for clinical trials, especially in settings where PASC diagnosis codes are used less frequently or less reliably. Analysis of how specific features contribute to the classification process may assist in gaining a better understanding of features that are associated with PASC diagnoses.
dc.description.sponsorshipFunding source: This research was funded by the National Institutes of Health (NIH) Agreement OT2HL161847-01 as part of the Researching COVID to Enhance Recovery (RECOVER) program of research.
dc.identifier.citationLorman V, Razzaghi H, Song X, et al. A machine learning-based phenotype for long COVID in children: an EHR-based study from the RECOVER program. Preprint. medRxiv. 2022;2022.12.22.22283791. Published 2022 Dec 26. doi:10.1101/2022.12.22.22283791
dc.identifier.urihttps://hdl.handle.net/1805/36504
dc.language.isoen_US
dc.publisherCold Spring Harbor Laboratory
dc.relation.isversionof10.1101/2022.12.22.22283791
dc.relation.journalmedRxiv
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/
dc.sourcePMC
dc.subjectSARS-CoV2 infection
dc.subjectLong COVID
dc.subjectPost-Acute Sequelae
dc.titleA machine learning-based phenotype for long COVID in children: an EHR-based study from the RECOVER program
dc.typeArticle
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
nihpp-2022.12.22.22283791v1.pdf
Size:
537.85 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.99 KB
Format:
Item-specific license agreed upon to submission
Description: