ScholarWorksIndianapolis
  • Communities & Collections
  • Browse ScholarWorks
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Tiếng Việt
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log In
    or
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. Browse by Author

Browsing by Author "Xu, Ke"

Now showing 1 - 2 of 2
Results Per Page
Sort Options
  • Loading...
    Thumbnail Image
    Item
    Classifying early infant feeding status from clinical notes using natural language processing and machine learning
    (Springer Nature, 2024-04-03) Lemas, Dominick J.; Du, Xinsong; Rouhizadeh, Masoud; Lewis, Braeden; Frank, Simon; Wright, Lauren; Spirache, Alex; Gonzalez, Lisa; Cheves, Ryan; Magalhães, Marina; Zapata, Ruben; Reddy, Rahul; Xu, Ke; Parker, Leslie; Harle, Chris; Young, Bridget; Louis‑Jaques, Adetola; Zhang, Bouri; Thompson, Lindsay; Hogan, William R.; Modave, François; Health Policy and Management, Richard M. Fairbanks School of Public Health
    The objective of this study is to develop and evaluate natural language processing (NLP) and machine learning models to predict infant feeding status from clinical notes in the Epic electronic health records system. The primary outcome was the classification of infant feeding status from clinical notes using Medical Subject Headings (MeSH) terms. Annotation of notes was completed using TeamTat to uniquely classify clinical notes according to infant feeding status. We trained 6 machine learning models to classify infant feeding status: logistic regression, random forest, XGBoost gradient descent, k-nearest neighbors, and support-vector classifier. Model comparison was evaluated based on overall accuracy, precision, recall, and F1 score. Our modeling corpus included an even number of clinical notes that was a balanced sample across each class. We manually reviewed 999 notes that represented 746 mother-infant dyads with a mean gestational age of 38.9 weeks and a mean maternal age of 26.6 years. The most frequent feeding status classification present for this study was exclusive breastfeeding [n = 183 (18.3%)], followed by exclusive formula bottle feeding [n = 146 (14.6%)], and exclusive feeding of expressed mother’s milk [n = 102 (10.2%)], with mixed feeding being the least frequent [n = 23 (2.3%)]. Our final analysis evaluated the classification of clinical notes as breast, formula/bottle, and missing. The machine learning models were trained on these three classes after performing balancing and down sampling. The XGBoost model outperformed all others by achieving an accuracy of 90.1%, a macro-averaged precision of 90.3%, a macro-averaged recall of 90.1%, and a macro-averaged F1 score of 90.1%. Our results demonstrate that natural language processing can be applied to clinical notes stored in the electronic health records to classify infant feeding status. Early identification of breastfeeding status using NLP on unstructured electronic health records data can be used to inform precision public health interventions focused on improving lactation support for postpartum patients.
  • Loading...
    Thumbnail Image
    Item
    Multi-ancestry study of the genetics of problematic alcohol use in over 1 million individuals
    (Springer Nature, 2023) Zhou, Hang; Kember, Rachel L.; Deak, Joseph D.; Xu, Heng; Toikumo, Sylvanus; Yuan, Kai; Lind, Penelope A.; Farajzadeh, Leila; Wang, Lu; Hatoum, Alexander S.; Johnson, Jessica; Lee, Hyunjoon; Mallard, Travis T.; Xu, Jiayi; Johnston, Keira J. A.; Johnson, Emma C.; Galimberti, Marco; Dao, Cecilia; Levey, Daniel F.; Overstreet, Cassie; Byrne, Enda M.; Gillespie, Nathan A.; Gordon, Scott; Hickie, Ian B.; Whitfield, John B.; Xu, Ke; Zhao, Hongyu; Huckins, Laura M.; Davis, Lea K.; Sanchez-Roige, Sandra; Madden, Pamela A. F.; Heath, Andrew C.; Medland, Sarah E.; Martin, Nicholas G.; Ge, Tian; Smoller, Jordan W.; Hougaard, David M.; Børglum, Anders D.; Demontis, Ditte; Krystal, John H.; Gaziano, J. Michael; Edenberg, Howard J.; Agrawal, Arpana; Million Veteran Program; Justice, Amy C.; Stein, Murray B.; Kranzler, Henry R.; Gelernter, Joel; Biochemistry and Molecular Biology, School of Medicine
    Problematic alcohol use (PAU), a trait that combines alcohol use disorder and alcohol-related problems assessed with a questionnaire, is a leading cause of death and morbidity worldwide. Here we conducted a large cross-ancestry meta-analysis of PAU in 1,079,947 individuals (European, N = 903,147; African, N = 122,571; Latin American, N = 38,962; East Asian, N = 13,551; and South Asian, N = 1,716 ancestries). We observed a high degree of cross-ancestral similarity in the genetic architecture of PAU and identified 110 independent risk variants in within- and cross-ancestry analyses. Cross-ancestry fine mapping improved the identification of likely causal variants. Prioritizing genes through gene expression and chromatin interaction in brain tissues identified multiple genes associated with PAU. We identified existing medications for potential pharmacological studies by a computational drug repurposing analysis. Cross-ancestry polygenic risk scores showed better performance of association in independent samples than single-ancestry polygenic risk scores. Genetic correlations between PAU and other traits were observed in multiple ancestries, with other substance use traits having the highest correlations. This study advances our knowledge of the genetic etiology of PAU, and these findings may bring possible clinical applicability of genetics insights-together with neuroscience, biology and data science-closer.
About IU Indianapolis ScholarWorks
  • Accessibility
  • Privacy Notice
  • Copyright © 2025 The Trustees of Indiana University