Pancreatic Cysts Identification Using Unstructured Information Management Architecture

dc.contributor.authorMehrabi, Saeed
dc.contributor.authorSchmidt, C. Max
dc.contributor.authorWaters, Joshua A.
dc.contributor.authorBeesley, Chris
dc.contributor.authorKrishnan, Anand
dc.contributor.authorKesterson, Joe
dc.contributor.authorDexter, Paul
dc.contributor.authorAl-Haddad, Mohammed A.
dc.contributor.authorPalakal, Mathew
dc.date.accessioned2015-10-02T13:16:19Z
dc.date.available2015-10-02T13:16:19Z
dc.date.issued2013-04-05
dc.descriptionposter abstracten_US
dc.description.abstractPancreatic cancer is one of the deadliest cancers, mostly diagnosed at late stages. Patients with pancreatic cysts are at higher risk of developing cancer and surveillance of these patients can help with early diagnosis. Much information about pancreatic cysts can be found in free text format in various medical narratives. In this retrospective study, a corpus of 1064 records from 44 patients at Indiana University Hospital from 1990 to 2012 was collected. A natural language processing system was developed and used to identify patients with pancreatic cysts. The input goes through series of tasks within the Unstructured Information Management Architecture (UIMA) framework consisting of report separation, metadata detection, sentence detection, concept annotation and writing into the database. Metadata such as medical record number (MRN), report id, report name, report date, report body were extracted from each report. Sentences were detected and concepts within each sentence were extracted using regular expression. Regular expression is a pattern of characters matching specific string of text. Our medical team assembled concepts that are used to identify pancreatic cysts in medical reports and additional keywords were added by searching through literature and Unified Medical Language System (UMLS) knowledge base. The Negex Algorithm was used to find out negation status of concepts. The 1064 reports were divided into sets of train and test sets. Two pancreatic-cyst surgeons created the gold standard data (Inter annotator agreement K=88%). The training set was analyzed to modify the regular expression. The concept identification using the NegEx algorithm resulted in precision and recall of 98.9% and 89% respectively. In order to improve the performance of negation detection, Stanford Dependency parser (SDP) was used. SDP finds out how words are related to each other in a sentence. SDP based negation algorithm improved the recall to 95.7%.en_US
dc.identifier.citationMehrabi, Saeed, C. Max Schmidt, Joshua A. Waters, Chris Beesley, Anand Krishnan, Joe Kesterson, Paul Dexter, Mohammed A. A. Al-Haddad, and Mathew Palakal. (2013, April 5). Pancreatic Cysts Identification Using Unstructured Information Management Architecture. Poster session presented at IUPUI Research Day 2013, Indianapolis, Indiana.en_US
dc.identifier.urihttps://hdl.handle.net/1805/7106
dc.language.isoen_USen_US
dc.publisherOffice of the Vice Chancellor for Researchen_US
dc.subjectPancreatic cystsen_US
dc.subjectunstructured informaiton management architectureen_US
dc.subjectPancreatic canceren_US
dc.subjectregular expression expressionPancreatic cystsen_US
dc.titlePancreatic Cysts Identification Using Unstructured Information Management Architectureen_US
dc.typePosteren_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Mehrabi-pancreatic.pdf
Size:
152.36 KB
Format:
Adobe Portable Document Format