- Browse by Author
Browsing by Author "Morizono, Hiroki"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item A machine learning-based phenotype for long COVID in children: an EHR-based study from the RECOVER program(Cold Spring Harbor Laboratory, 2022-12-26) Lorman, Vitaly; Razzaghi, Hanieh; Song, Xing; Morse, Keith; Utidjian, Levon; Allen, Andrea J.; Rao, Suchitra; Rogerson, Colin; Bennett, Tellen D.; Morizono, Hiroki; Eckrich, Daniel; Jhaveri, Ravi; Huang, Yungui; Ranade, Daksha; Pajor, Nathan; Lee, Grace M.; Forrest, Christopher B.; Bailey, L. Charles; Pediatrics, School of MedicineBackground: As clinical understanding of pediatric Post-Acute Sequelae of SARS CoV-2 (PASC) develops, and hence the clinical definition evolves, it is desirable to have a method to reliably identify patients who are likely to have post-acute sequelae of SARS CoV-2 (PASC) in health systems data. Methods and findings: In this study, we developed and validated a machine learning algorithm to classify which patients have PASC (distinguishing between Multisystem Inflammatory Syndrome in Children (MIS-C) and non-MIS-C variants) from a cohort of patients with positive SARS-CoV-2 test results in pediatric health systems within the PEDSnet EHR network. Patient features included in the model were selected from conditions, procedures, performance of diagnostic testing, and medications using a tree-based scan statistic approach. We used an XGboost model, with hyperparameters selected through cross-validated grid search, and model performance was assessed using 5-fold cross-validation. Model predictions and feature importance were evaluated using Shapley Additive exPlanation (SHAP) values. Conclusions: The model provides a tool for identifying patients with PASC and an approach to characterizing PASC using diagnosis, medication, laboratory, and procedure features in health systems data. Using appropriate threshold settings, the model can be used to identify PASC patients in health systems data at higher precision for inclusion in studies or at higher recall in screening for clinical trials, especially in settings where PASC diagnosis codes are used less frequently or less reliably. Analysis of how specific features contribute to the classification process may assist in gaining a better understanding of features that are associated with PASC diagnoses.Item EHR-based Case Identification of Pediatric Long COVID: A Report from the RECOVER EHR Cohort(medRxiv, 2024-05-23) Botdorf, Morgan; Dickinson, Kimberley; Lorman, Vitaly; Razzaghi, Hanieh; Marchesani, Nicole; Rao, Suchitra; Rogerson, Colin; Higginbotham, Miranda; Mejias, Asuncion; Salyakina, Daria; Thacker, Deepika; Dandachi, Dima; Christakis, Dimitri A.; Taylor, Emily; Schwenk, Hayden; Morizono, Hiroki; Cogen, Jonathan; Pajor, Nate M.; Jhaveri, Ravi; Forrest, Christopher B.; Bailey, L. Charles; RECOVER Consortium; Pediatrics, School of MedicineObjective: Long COVID, marked by persistent, recurring, or new symptoms post-COVID-19 infection, impacts children's well-being yet lacks a unified clinical definition. This study evaluates the performance of an empirically derived Long COVID case identification algorithm, or computable phenotype, with manual chart review in a pediatric sample. This approach aims to facilitate large-scale research efforts to understand this condition better. Methods: The algorithm, composed of diagnostic codes empirically associated with Long COVID, was applied to a cohort of pediatric patients with SARS-CoV-2 infection in the RECOVER PCORnet EHR database. The algorithm classified 31,781 patients with conclusive, probable, or possible Long COVID and 307,686 patients without evidence of Long COVID. A chart review was performed on a subset of patients (n=651) to determine the overlap between the two methods. Instances of discordance were reviewed to understand the reasons for differences. Results: The sample comprised 651 pediatric patients (339 females, M age = 10.10 years) across 16 hospital systems. Results showed moderate overlap between phenotype and chart review Long COVID identification (accuracy = 0.62, PPV = 0.49, NPV = 0.75); however, there were also numerous cases of disagreement. No notable differences were found when the analyses were stratified by age at infection or era of infection. Further examination of the discordant cases revealed that the most common cause of disagreement was the clinician reviewers' tendency to attribute Long COVID-like symptoms to prior medical conditions. The performance of the phenotype improved when prior medical conditions were considered (accuracy = 0.71, PPV = 0.65, NPV = 0.74). Conclusions: Although there was moderate overlap between the two methods, the discrepancies between the two sources are likely attributed to the lack of consensus on a Long COVID clinical definition. It is essential to consider the strengths and limitations of each method when developing Long COVID classification algorithms.