Automated LOINC Mapping with Biomedical NLP Models: Enabling Scalable Health Information Exchange via the Open Concept Lab

Date
2026
Language
American English
Embargo Lift Date
Department
Committee Members
Degree
Degree Year
Department
Grantor
Journal Title
Journal ISSN
Volume Title
Found At
Oxford University Press
Can't use the file because of accessibility barriers? Contact us with the title of the item, permanent link, and specifics of your accommodation need.
Abstract

Objectives: Efficient exchange of health information requires consistent representation of clinical concepts across laboratories, hospitals, and public health systems. LOINC supports this interoperability by standardizing laboratory test codes, but mapping remains difficult when datasets are incomplete, inconsistently formatted, or structurally diverse. These challenges often create a mismatch between algorithmic performance in controlled settings and real-world deployment. This study aimed to develop a biomedical natural language processing (NLP) approach for mapping heterogeneous laboratory test strings to LOINC v2.81 and to compare its performance with established algorithms in the Open Concept Lab (OCL) Mapper.

Materials and Methods: We implemented a ScispaCy-based pipeline (ScispaCy-LOINC) that identifies clinical entities, links them to UMLS Concept Unique Identifiers, assembles LOINC codes from LOINC parts, and ranks candidates using a weighted scoring system. Overall and ranked performance was evaluated against 2 OCL algorithms, Elasticsearch Keyword Retrieval (OCL-Keyword) and MiniLM Semantic Search (OCL-Semantic), on 2 datasets: MIMIC-IV lab_d_items and a LOINC-mapped subset of the CIEL interface terminology v2025-07-15.

Results: In MIMIC-IV, the ScispaCy-LOINC achieved the highest coverage, correctly identifying the LOINC code in 42.3% of cases, outperforming OCL-Keyword (19.5%) and OCL-Semantic (21.4%). In the CIEL dataset, OCL-Semantic achieved the highest coverage (54.4%), followed by OCL-Keyword (46.9%) and ScispaCy-LOINC (28.4%).

Discussion: These results indicate that ScispaCy-LOINC is particularly effective for noisier or structurally sparse inputs, whereas OCL-based approaches perform better for more standardized terminologies, highlighting complementary algorithmic strengths.

Conclusion: ScispaCy-LOINC offers a flexible approach to LOINC mapping and demonstrates complementary strengths relative to existing OCL algorithms. These findings support the development of an integrated framework that combines algorithmic strategies to improve robustness across diverse clinical datasets.

Description
item.page.description.tableofcontents
item.page.relation.haspart
Cite As
Naliyatthaliyazchayil P, Sangam VR, Amlung J, Kanter AS, Purkayastha S, Payne J. Automated Logical Observation Identifiers Names and Codes mapping with biomedical natural language processing models: enabling scalable health information exchange via the Open Concept Lab. Journal of the American Medical Informatics Association. 2026 Feb 11:ocag010.
ISSN
Publisher
Series/Report
Sponsorship
Major
Extent
Identifier
Relation
Journal
Source
Alternative Title
Type
Article
Number
Volume
Conference Dates
Conference Host
Conference Location
Conference Name
Conference Panel
Conference Secretariat Location
Version
Full Text Available at
This item is under embargo {{howLong}}