- Browse by Author
Browsing by Author "Al-Haddad, Mohammed A."
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Pancreatic Cancer Risk Stratification based on Patient Family History(Office of the Vice Chancellor for Research, 2013-04-05) Krishnan, Anand; Schmidt, C. Max; Roch, Alexandra M.; Beesley, Chris; Mehrabi, Saeed; Kesterson, Joe; Dexter, Paul; Al-Haddad, Mohammed A.; Palakal, MathewBackground: Pancreatic cancer is the fourth leading cause of cancer-related deaths in the US with an annual death rate approximating the incidence (38,460 and 45,220 respectively according to 2013 American Cancer Society). Due to delayed diagnosis, only 8% of patients are amenable to surgical resection, resulting in a 5-year survival rate of less than 6%. Screening the general population for pancreatic cancer is not feasible because of its low incidence (12.1 per 100,000 per year) and the lack of accurate screening tools. However, patients with an inherited predisposition to pancreatic cancer would benefit from selective screening. Methods: Clinical notes of patients from Indiana University (IU) Hospitals were used in this study. A Natural Language Processing (NLP) system based on the Unstructured Information Management Architecture framework was developed to process the family history data and extract pancreatic cancer information. This was performed through a series of NLP processes including report separation, section separation, sentence detection and keyword extraction. The family members and their corresponding diseases were extracted using regular expressions. The Stanford dependency parser was used to accurately link the family member and their diseases. Negation analysis was done using the NegEx algorithm. PancPro risk-prediction software was used to assess the lifetime risk scores of pancreatic cancer for each patient according to his/her family history. A decision tree was constructed based on these scores. Results: A corpus of 2000 reports of patients at IU Hospitals from 1990 to 2012 was collected. The family history section was present in 249 of these reports containing 463 sentences. The system was able to identify 222 reports (accuracy 87.5%) and 458 sentences (accuracy 91.36%). Conclusion: The family history risk score will be used for patients’ pancreatic cancer risk stratification, thus contributing to selective screening.Item Pancreatic Cysts Identification Using Unstructured Information Management Architecture(Office of the Vice Chancellor for Research, 2013-04-05) Mehrabi, Saeed; Schmidt, C. Max; Waters, Joshua A.; Beesley, Chris; Krishnan, Anand; Kesterson, Joe; Dexter, Paul; Al-Haddad, Mohammed A.; Palakal, MathewPancreatic cancer is one of the deadliest cancers, mostly diagnosed at late stages. Patients with pancreatic cysts are at higher risk of developing cancer and surveillance of these patients can help with early diagnosis. Much information about pancreatic cysts can be found in free text format in various medical narratives. In this retrospective study, a corpus of 1064 records from 44 patients at Indiana University Hospital from 1990 to 2012 was collected. A natural language processing system was developed and used to identify patients with pancreatic cysts. The input goes through series of tasks within the Unstructured Information Management Architecture (UIMA) framework consisting of report separation, metadata detection, sentence detection, concept annotation and writing into the database. Metadata such as medical record number (MRN), report id, report name, report date, report body were extracted from each report. Sentences were detected and concepts within each sentence were extracted using regular expression. Regular expression is a pattern of characters matching specific string of text. Our medical team assembled concepts that are used to identify pancreatic cysts in medical reports and additional keywords were added by searching through literature and Unified Medical Language System (UMLS) knowledge base. The Negex Algorithm was used to find out negation status of concepts. The 1064 reports were divided into sets of train and test sets. Two pancreatic-cyst surgeons created the gold standard data (Inter annotator agreement K=88%). The training set was analyzed to modify the regular expression. The concept identification using the NegEx algorithm resulted in precision and recall of 98.9% and 89% respectively. In order to improve the performance of negation detection, Stanford Dependency parser (SDP) was used. SDP finds out how words are related to each other in a sentence. SDP based negation algorithm improved the recall to 95.7%.