ScholarWorksIndianapolis
  • Communities & Collections
  • Browse ScholarWorks
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Tiếng Việt
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log In
    or
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. Browse by Author

Browsing by Author "Hogan, William R."

Now showing 1 - 4 of 4
Results Per Page
Sort Options
  • Loading...
    Thumbnail Image
    Item
    A large language model for electronic health records
    (Springer Nature, 2022-12-26) Yang, Xi; Chen, Aokun; PourNejatian, Nima; Shin, Hoo Chang; Smith, Kaleb E.; Parisien, Christopher; Compas, Colin; Martin, Cheryl; Costa, Anthony B.; Flores, Mona G.; Zhang, Ying; Magoc, Tanja; Harle, Christopher A.; Lipori, Gloria; Mitchell, Duane A.; Hogan, William R.; Shenkman, Elizabeth A.; Bian, Jiang; Wu, Yonghui; Health Policy and Management, Richard M. Fairbanks School of Public Health
    There is an increasing interest in developing artificial intelligence (AI) systems to process and interpret electronic health records (EHRs). Natural language processing (NLP) powered by pretrained language models is the key technology for medical AI systems utilizing clinical narratives. However, there are few clinical language models, the largest of which trained in the clinical domain is comparatively small at 110 million parameters (compared with billions of parameters in the general domain). It is not clear how large clinical language models with billions of parameters can help medical AI systems utilize unstructured EHRs. In this study, we develop from scratch a large clinical language model—GatorTron—using >90 billion words of text (including >82 billion words of de-identified clinical text) and systematically evaluate it on five clinical NLP tasks including clinical concept extraction, medical relation extraction, semantic textual similarity, natural language inference (NLI), and medical question answering (MQA). We examine how (1) scaling up the number of parameters and (2) scaling up the size of the training data could benefit these NLP tasks. GatorTron models scale up the clinical language model from 110 million to 8.9 billion parameters and improve five clinical NLP tasks (e.g., 9.6% and 9.5% improvement in accuracy for NLI and MQA), which can be applied to medical AI systems to improve healthcare delivery. The GatorTron models are publicly available at: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/models/gatortron_og
  • Loading...
    Thumbnail Image
    Item
    Classifying early infant feeding status from clinical notes using natural language processing and machine learning
    (Springer Nature, 2024-04-03) Lemas, Dominick J.; Du, Xinsong; Rouhizadeh, Masoud; Lewis, Braeden; Frank, Simon; Wright, Lauren; Spirache, Alex; Gonzalez, Lisa; Cheves, Ryan; Magalhães, Marina; Zapata, Ruben; Reddy, Rahul; Xu, Ke; Parker, Leslie; Harle, Chris; Young, Bridget; Louis‑Jaques, Adetola; Zhang, Bouri; Thompson, Lindsay; Hogan, William R.; Modave, François; Health Policy and Management, Richard M. Fairbanks School of Public Health
    The objective of this study is to develop and evaluate natural language processing (NLP) and machine learning models to predict infant feeding status from clinical notes in the Epic electronic health records system. The primary outcome was the classification of infant feeding status from clinical notes using Medical Subject Headings (MeSH) terms. Annotation of notes was completed using TeamTat to uniquely classify clinical notes according to infant feeding status. We trained 6 machine learning models to classify infant feeding status: logistic regression, random forest, XGBoost gradient descent, k-nearest neighbors, and support-vector classifier. Model comparison was evaluated based on overall accuracy, precision, recall, and F1 score. Our modeling corpus included an even number of clinical notes that was a balanced sample across each class. We manually reviewed 999 notes that represented 746 mother-infant dyads with a mean gestational age of 38.9 weeks and a mean maternal age of 26.6 years. The most frequent feeding status classification present for this study was exclusive breastfeeding [n = 183 (18.3%)], followed by exclusive formula bottle feeding [n = 146 (14.6%)], and exclusive feeding of expressed mother’s milk [n = 102 (10.2%)], with mixed feeding being the least frequent [n = 23 (2.3%)]. Our final analysis evaluated the classification of clinical notes as breast, formula/bottle, and missing. The machine learning models were trained on these three classes after performing balancing and down sampling. The XGBoost model outperformed all others by achieving an accuracy of 90.1%, a macro-averaged precision of 90.3%, a macro-averaged recall of 90.1%, and a macro-averaged F1 score of 90.1%. Our results demonstrate that natural language processing can be applied to clinical notes stored in the electronic health records to classify infant feeding status. Early identification of breastfeeding status using NLP on unstructured electronic health records data can be used to inform precision public health interventions focused on improving lactation support for postpartum patients.
  • Loading...
    Thumbnail Image
    Item
    Sustainability considerations for clinical and translational research informatics infrastructure
    (Cambridge University Press, 2018-10) Obeid, Jihad S.; Tarczy-Hornoch, Peter; Harris, Paul A.; Barnett, William K.; Anderson, Nicholas R.; Embi, Peter J.; Hogan, William R.; Bell, Douglas S.; McIntosh, Leslie D.; Knosp, Boyd; Tachinardi, Umberto; Cimino, James J.; Wehbe, Firas H.; Medicine, School of Medicine
    A robust biomedical informatics infrastructure is essential for academic health centers engaged in translational research. There are no templates for what such an infrastructure encompasses or how it is funded. An informatics workgroup within the Clinical and Translational Science Awards network conducted an analysis to identify the scope, governance, and funding of this infrastructure. After we identified the essential components of an informatics infrastructure, we surveyed informatics leaders at network institutions about the governance and sustainability of the different components. Results from 42 survey respondents showed significant variations in governance and sustainability; however, some trends also emerged. Core informatics components such as electronic data capture systems, electronic health records data repositories, and related tools had mixed models of funding including, fee-for-service, extramural grants, and institutional support. Several key components such as regulatory systems (e.g., electronic Institutional Review Board [IRB] systems, grants, and contracts), security systems, data warehouses, and clinical trials management systems were overwhelmingly supported as institutional infrastructure. The findings highlighted in this report are worth noting for academic health centers and funding agencies involved in planning current and future informatics infrastructure, which provides the foundation for a robust, data-driven clinical and translational research program.
  • Loading...
    Thumbnail Image
    Item
    The OneFlorida Data Trust: a centralized, translational research data infrastructure of statewide scope
    (Oxford University Press, 2022) Hogan, William R.; Shenkman, Elizabeth A.; Robinson, Temple; Carasquillo, Olveen; Robinson, Patricia S.; Essner, Rebecca Z.; Bian, Jiang; Lipori, Gigi; Harle, Christopher; Magoc, Tanja; Manini, Lizabeth; Mendoza, Tona; White, Sonya; Loiacono, Alex; Hall, Jackie; Nelson, Dave; Biostatistics and Health Data Science, Richard M. Fairbanks School of Public Health
    The OneFlorida Data Trust is a centralized research patient data repository created and managed by the OneFlorida Clinical Research Consortium (“OneFlorida”). It comprises structured electronic health record (EHR), administrative claims, tumor registry, death, and other data on 17.2 million individuals who received healthcare in Florida between January 2012 and the present. Ten healthcare systems in Miami, Orlando, Tampa, Jacksonville, Tallahassee, Gainesville, and rural areas of Florida contribute EHR data, covering the major metropolitan regions in Florida. Deduplication of patients is accomplished via privacy-preserving entity resolution (precision 0.97–0.99, recall 0.75), thereby linking patients’ EHR, claims, and death data. Another unique feature is the establishment of mother-baby relationships via Florida vital statistics data. Research usage has been significant, including major studies launched in the National Patient-Centered Clinical Research Network (“PCORnet”), where OneFlorida is 1 of 9 clinical research networks. The Data Trust’s robust, centralized, statewide data are a valuable and relatively unique research resource.
About IU Indianapolis ScholarWorks
  • Accessibility
  • Privacy Notice
  • Copyright © 2025 The Trustees of Indiana University