Automated LOINC Mapping with Biomedical NLP Models: Enabling Scalable Health Information Exchange via the Open Concept Lab
Date
Language
Embargo Lift Date
Department
Committee Members
Degree
Degree Year
Department
Grantor
Journal Title
Journal ISSN
Volume Title
Found At
Abstract
Objectives: Efficient exchange of health information requires consistent representation of clinical concepts across laboratories, hospitals, and public health systems. LOINC supports this interoperability by standardizing laboratory test codes, but mapping remains difficult when datasets are incomplete, inconsistently formatted, or structurally diverse. These challenges often create a mismatch between algorithmic performance in controlled settings and real-world deployment. This study aimed to develop a biomedical natural language processing (NLP) approach for mapping heterogeneous laboratory test strings to LOINC v2.81 and to compare its performance with established algorithms in the Open Concept Lab (OCL) Mapper.
Materials and Methods: We implemented a ScispaCy-based pipeline (ScispaCy-LOINC) that identifies clinical entities, links them to UMLS Concept Unique Identifiers, assembles LOINC codes from LOINC parts, and ranks candidates using a weighted scoring system. Overall and ranked performance was evaluated against 2 OCL algorithms, Elasticsearch Keyword Retrieval (OCL-Keyword) and MiniLM Semantic Search (OCL-Semantic), on 2 datasets: MIMIC-IV lab_d_items and a LOINC-mapped subset of the CIEL interface terminology v2025-07-15.
Results: In MIMIC-IV, the ScispaCy-LOINC achieved the highest coverage, correctly identifying the LOINC code in 42.3% of cases, outperforming OCL-Keyword (19.5%) and OCL-Semantic (21.4%). In the CIEL dataset, OCL-Semantic achieved the highest coverage (54.4%), followed by OCL-Keyword (46.9%) and ScispaCy-LOINC (28.4%).
Discussion: These results indicate that ScispaCy-LOINC is particularly effective for noisier or structurally sparse inputs, whereas OCL-based approaches perform better for more standardized terminologies, highlighting complementary algorithmic strengths.
Conclusion: ScispaCy-LOINC offers a flexible approach to LOINC mapping and demonstrates complementary strengths relative to existing OCL algorithms. These findings support the development of an integrated framework that combines algorithmic strategies to improve robustness across diverse clinical datasets.
