ScholarWorksIndianapolis
  • Communities & Collections
  • Browse ScholarWorks
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Tiếng Việt
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log In
    or
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. Browse by Author

Browsing by Author "Faxvaag, Arild"

Now showing 1 - 3 of 3
Results Per Page
Sort Options
  • Loading...
    Thumbnail Image
    Item
    A Systematic Approach to Configuring MetaMap for Optimal Performance
    (Thieme, 2022) Jing, Xia; Indani, Akash; Hubig, Nina; Min, Hua; Gong, Yang; Cimino, James J.; Sittig, Dean F.; Rennert, Lior; Robinson, David; Biondich, Paul; Wright, Adam; Nøhr, Christian; Law, Timothy; Faxvaag, Arild; Gimbel, Ronald; Pediatrics, School of Medicine
    Background: MetaMap is a valuable tool for processing biomedical texts to identify concepts. Although MetaMap is highly configurative, configuration decisions are not straightforward. Objective: To develop a systematic, data-driven methodology for configuring MetaMap for optimal performance. Methods: MetaMap, the word2vec model, and the phrase model were used to build a pipeline. For unsupervised training, the phrase and word2vec models used abstracts related to clinical decision support as input. During testing, MetaMap was configured with the default option, one behavior option, and two behavior options. For each configuration, cosine and soft cosine similarity scores between identified entities and gold-standard terms were computed for 40 annotated abstracts (422 sentences). The similarity scores were used to calculate and compare the overall percentages of exact matches, similar matches, and missing gold-standard terms among the abstracts for each configuration. The results were manually spot-checked. The precision, recall, and F-measure (β =1) were calculated. Results: The percentages of exact matches and missing gold-standard terms were 0.6-0.79 and 0.09-0.3 for one behavior option, and 0.56-0.8 and 0.09-0.3 for two behavior options, respectively. The percentages of exact matches and missing terms for soft cosine similarity scores exceeded those for cosine similarity scores. The average precision, recall, and F-measure were 0.59, 0.82, and 0.68 for exact matches, and 1.00, 0.53, and 0.69 for missing terms, respectively. Conclusion: We demonstrated a systematic approach that provides objective and accurate evidence guiding MetaMap configurations for optimizing performance. Combining objective evidence and the current practice of using principles, experience, and intuitions outperforms a single strategy in MetaMap configurations. Our methodology, reference codes, measurements, results, and workflow are valuable references for optimizing and configuring MetaMap.
  • Loading...
    Thumbnail Image
    Item
    Keyphrase Identification Using Minimal Labeled Data with Hierarchical Context and Transfer Learning
    (medRxiv, 2023-05-26) Goli, Rohan; Hubig, Nina; Min, Hua; Gong, Yang; Sittig, Dean F.; Rennert, Lior; Robinson, David; Biondich, Paul; Wright, Adam; Nøhr, Christian; Law, Timothy; Faxvaag, Arild; Weaver, Aneesa; Gimbel, Ronald; Jing, Xia; Pediatrics, School of Medicine
    Interoperable clinical decision support system (CDSS) rules provide a pathway to interoperability, a well-recognized challenge in health information technology. Building an ontology facilitates creating interoperable CDSS rules, which can be achieved by identifying the keyphrases (KP) from the existing literature. However, KP identification for data labeling requires human expertise, consensus, and contextual understanding. This paper aims to present a semi-supervised KP identification framework using minimal labeled data based on hierarchical attention over the documents and domain adaptation. Our method outperforms the prior neural architectures by learning through synthetic labels for initial training, document-level contextual learning, language modeling, and fine-tuning with limited gold standard label data. To the best of our knowledge, this is the first functional framework for the CDSS sub-domain to identify KPs, which is trained on limited labeled data. It contributes to the general natural language processing (NLP) architectures in areas such as clinical NLP, where manual data labeling is challenging, and light-weighted deep learning models play a role in real-time KP identification as a complementary approach to human experts' effort.
  • Loading...
    Thumbnail Image
    Item
    Vaccination Schedules Recommended by the Centers for Disease Control and Prevention: From Human-Readable to Machine-Processable
    (MDPI, 2025-04-22) Jing, Xia; Min, Hua; Gong, Yang; Ernst, Mytchell A.; Weaver, Aneesa; Crozier, Chloe; Robinson, David; Sittig, Dean F.; Biondich, Paul G.; Orlioglu, Samuil; Boobalan, Akash Shanmugan; Abanyie, Kojo; Boyce, Richard D.; Wright, Adam; Nøhr, Christian; Law, Timothy D.; Faxvaag, Arild; Rennert, Lior; Gimbel, Ronald W.; Pediatrics, School of Medicine
    Background: Reusable, machine-processable clinical decision support system (CDSS) rules have not been widely achieved in the medical informatics field. This study introduces the process, results, challenges faced, and lessons learned while converting the United States of America Centers for Disease Control and Prevention (CDC)-recommended immunization schedules (2022) to machine-processable CDSS rules. Methods: We converted the vaccination schedules into tabular, charts, MS Excel, and clinical quality language (CQL) formats. The CQL format can be automatically converted to a machine-processable format using existing tools. Therefore, it was regarded as a machine-processable format. The results were reviewed, verified, and tested. Results: We have developed 465 rules for 19 vaccines in 13 categories, and we have shared the rules via GitHub to make them publicly available. We used cross-review and cross-checking to validate the CDSS rules in tabular and chart formats. The CQL files were tested for syntax and logic with hypothetical patient HL7 FHIR resources. Our rules can be reused and shared by the health IT industry, CDSS developers, medical informatics educators, or clinical care institutions. The unique contributions of our work are twofold: (1) we created ontology-based, machine-processable, and reusable immunization recommendation rules, and (2) we created and shared multiple formats of immunization recommendation rules publicly which can be a valuable resource for medical and medical informatics communities. Conclusions: These CDSS rules can be important contributions to informatics communities, reducing redundant efforts, which is particularly significant in resource-limited settings. Despite the maturity and concise presentation of the CDC recommendations, careful attention and multiple layers of verification and review are necessary to ensure accurate conversion. The publicly shared CDSS rules can also be used for health and biomedical informatics education and training purposes.
About IU Indianapolis ScholarWorks
  • Accessibility
  • Privacy Notice
  • Copyright © 2025 The Trustees of Indiana University