NERO: a biomedical named-entity (recognition) ontology with a large, annotated corpus reveals meaningful associations through text embedding

dc.contributor.authorWang, Kanix
dc.contributor.authorStevens, Robert
dc.contributor.authorAlachram, Halima
dc.contributor.authorLi, Yu
dc.contributor.authorSoldatova, Larisa
dc.contributor.authorKing, Ross
dc.contributor.authorAnaniadou, Sophia
dc.contributor.authorSchoene, Annika M.
dc.contributor.authorLi, Maolin
dc.contributor.authorChristopoulou, Fenia
dc.contributor.authorAmbite, José Luis
dc.contributor.authorMatthew, Joel
dc.contributor.authorGarg, Sahil
dc.contributor.authorHermjakob, Ulf
dc.contributor.authorMarcu, Daniel
dc.contributor.authorSheng, Emily
dc.contributor.authorBeißbarth, Tim
dc.contributor.authorWingender, Edgar
dc.contributor.authorGalstyan, Aram
dc.contributor.authorGao, Xin
dc.contributor.authorChambers, Brendan
dc.contributor.authorPan, Weidi
dc.contributor.authorKhomtchouk, Bohdan B.
dc.contributor.authorEvans, James A.
dc.contributor.authorRzhetsky, Andrey
dc.contributor.departmentBiomedical Engineering and Informatics, Luddy School of Informatics, Computing, and Engineering
dc.date.accessioned2025-03-11T13:17:47Z
dc.date.available2025-03-11T13:17:47Z
dc.date.issued2021-10-20
dc.description.abstractMachine reading (MR) is essential for unlocking valuable knowledge contained in millions of existing biomedical documents. Over the last two decades1,2, the most dramatic advances in MR have followed in the wake of critical corpus development3. Large, well-annotated corpora have been associated with punctuated advances in MR methodology and automated knowledge extraction systems in the same way that ImageNet4 was fundamental for developing machine vision techniques. This study contributes six components to an advanced, named entity analysis tool for biomedicine: (a) a new, Named Entity Recognition Ontology (NERO) developed specifically for describing textual entities in biomedical texts, which accounts for diverse levels of ambiguity, bridging the scientific sublanguages of molecular biology, genetics, biochemistry, and medicine; (b) detailed guidelines for human experts annotating hundreds of named entity classes; (c) pictographs for all named entities, to simplify the burden of annotation for curators; (d) an original, annotated corpus comprising 35,865 sentences, which encapsulate 190,679 named entities and 43,438 events connecting two or more entities; (e) validated, off-the-shelf, named entity recognition (NER) automated extraction, and; (f) embedding models that demonstrate the promise of biomedical associations embedded within this corpus.
dc.eprint.versionFinal published version
dc.identifier.citationWang K, Stevens R, Alachram H, et al. NERO: a biomedical named-entity (recognition) ontology with a large, annotated corpus reveals meaningful associations through text embedding. NPJ Syst Biol Appl. 2021;7(1):38. Published 2021 Oct 20. doi:10.1038/s41540-021-00200-x
dc.identifier.urihttps://hdl.handle.net/1805/46313
dc.language.isoen_US
dc.publisherSpringer Nature
dc.relation.isversionof10.1038/s41540-021-00200-x
dc.relation.journalNPJ: Systems Biology and Applications
dc.rightsAttribution 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.sourcePMC
dc.subjectSoftware
dc.subjectDiseases
dc.subjectMachine reading (MR)
dc.titleNERO: a biomedical named-entity (recognition) ontology with a large, annotated corpus reveals meaningful associations through text embedding
dc.typeArticle
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Wang2021Biomedical-CCBY.pdf
Size:
2.68 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.04 KB
Format:
Item-specific license agreed upon to submission
Description: