- Browse by Author
Browsing by Author "Beesley, Chris"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
Item Automated pancreatic cyst screening using natural language processing: a new tool in the early detection of pancreatic cancer(Elsevier, 2015-05) Roch, Alexandra M.; Mehrabi, Saeed; Krishnan, Anand; Schmidt, Heidi E.; Kesterson, Joseph; Beesley, Chris; Dexter, Paul R.; Palakal, Matthew; Schmidt, C. Max; Department of Surgery, IU School of MedicineINTRODUCTION: As many as 3% of computed tomography (CT) scans detect pancreatic cysts. Because pancreatic cysts are incidental, ubiquitous and poorly understood, follow-up is often not performed. Pancreatic cysts may have a significant malignant potential and their identification represents a 'window of opportunity' for the early detection of pancreatic cancer. The purpose of this study was to implement an automated Natural Language Processing (NLP)-based pancreatic cyst identification system. METHOD: A multidisciplinary team was assembled. NLP-based identification algorithms were developed based on key words commonly used by physicians to describe pancreatic cysts and programmed for automated search of electronic medical records. A pilot study was conducted prospectively in a single institution. RESULTS: From March to September 2013, 566,233 reports belonging to 50,669 patients were analysed. The mean number of patients reported with a pancreatic cyst was 88/month (range 78-98). The mean sensitivity and specificity were 99.9% and 98.8%, respectively. CONCLUSION: NLP is an effective tool to automatically identify patients with pancreatic cysts based on electronic medical records (EMR). This highly accurate system can help capture patients 'at-risk' of pancreatic cancer in a registry.Item DEEPEN: A negation detection system for clinical text incorporating dependency relation into NegEx(Elsevier, 2015-04) Mehrabi, Saeed; Krishnan, Krishnan; Sohn, Sunghwan; Roch, Alexandra M; Schmidt, Heidi; Kesterson, Joe; Beesley, Chris; Dexter, Paul; Schmidt, C. Max; Liu, Hongfang; Palakal, Mathew; Surgery, School of MedicineIn Electronic Health Records (EHRs), much of valuable information regarding patients’ conditions is embedded in free text format. Natural language processing (NLP) techniques have been developed to extract clinical information from free text. One challenge faced in clinical NLP is that the meaning of clinical entities is heavily affected by modifiers such as negation. A negation detection algorithm, NegEx, applies a simplistic approach that has been shown to be powerful in clinical NLP. However, due to the failure to consider the contextual relationship between words within a sentence, NegEx fails to correctly capture the negation status of concepts in complex sentences. Incorrect negation assignment could cause inaccurate diagnosis of patients’ condition or contaminated study cohorts. We developed a negation algorithm called DEEPEN to decrease NegEx’s false positives by taking into account the dependency relationship between negation words and concepts within a sentence using Stanford dependency parser. The system was developed and tested using EHR data from Indiana University (IU) and it was further evaluated on Mayo Clinic dataset to assess its generalizability. The evaluation results demonstrate DEEPEN, which incorporates dependency parsing into NegEx, can reduce the number of incorrect negation assignment for patients with positive findings, and therefore improve the identification of patients with the target clinical findings in EHRs.Item Identification of Patients with Family History of Pancreatic Cancer - Investigation of an NLP System Portability(IOS, 2015) Mehrabi, Saeed; Krishnan, Anand; Roch, Alexandra M.; Schmidt, Heidi; Li, DingCheng; Kesterson, Joe; Beesley, Chris; Dexter, Paul; Schmidt, Max; Palakal, Mathew; Liu, Hongfang; Department of BioHealth Informatics, School of Informatics and ComputingIn this study we have developed a rule-based natural language processing (NLP) system to identify patients with family history of pancreatic cancer. The algorithm was developed in a Unstructured Information Management Architecture (UIMA) framework and consisted of section segmentation, relation discovery, and negation detection. The system was evaluated on data from two institutions. The family history identification precision was consistent across the institutions shifting from 88.9% on Indiana University (IU) dataset to 87.8% on Mayo Clinic dataset. Customizing the algorithm on the the Mayo Clinic data, increased its precision to 88.1%. The family member relation discovery achieved precision, recall, and F-measure of 75.3%, 91.6% and 82.6% respectively. Negation detection resulted in precision of 99.1%. The results show that rule-based NLP approaches for specific information extraction tasks are portable across institutions; however customization of the algorithm on the new dataset improves its performance.Item Pancreatic Cancer Risk Stratification based on Patient Family History(Office of the Vice Chancellor for Research, 2013-04-05) Krishnan, Anand; Schmidt, C. Max; Roch, Alexandra M.; Beesley, Chris; Mehrabi, Saeed; Kesterson, Joe; Dexter, Paul; Al-Haddad, Mohammed A.; Palakal, MathewBackground: Pancreatic cancer is the fourth leading cause of cancer-related deaths in the US with an annual death rate approximating the incidence (38,460 and 45,220 respectively according to 2013 American Cancer Society). Due to delayed diagnosis, only 8% of patients are amenable to surgical resection, resulting in a 5-year survival rate of less than 6%. Screening the general population for pancreatic cancer is not feasible because of its low incidence (12.1 per 100,000 per year) and the lack of accurate screening tools. However, patients with an inherited predisposition to pancreatic cancer would benefit from selective screening. Methods: Clinical notes of patients from Indiana University (IU) Hospitals were used in this study. A Natural Language Processing (NLP) system based on the Unstructured Information Management Architecture framework was developed to process the family history data and extract pancreatic cancer information. This was performed through a series of NLP processes including report separation, section separation, sentence detection and keyword extraction. The family members and their corresponding diseases were extracted using regular expressions. The Stanford dependency parser was used to accurately link the family member and their diseases. Negation analysis was done using the NegEx algorithm. PancPro risk-prediction software was used to assess the lifetime risk scores of pancreatic cancer for each patient according to his/her family history. A decision tree was constructed based on these scores. Results: A corpus of 2000 reports of patients at IU Hospitals from 1990 to 2012 was collected. The family history section was present in 249 of these reports containing 463 sentences. The system was able to identify 222 reports (accuracy 87.5%) and 458 sentences (accuracy 91.36%). Conclusion: The family history risk score will be used for patients’ pancreatic cancer risk stratification, thus contributing to selective screening.Item Pancreatic Cysts Identification Using Unstructured Information Management Architecture(Office of the Vice Chancellor for Research, 2013-04-05) Mehrabi, Saeed; Schmidt, C. Max; Waters, Joshua A.; Beesley, Chris; Krishnan, Anand; Kesterson, Joe; Dexter, Paul; Al-Haddad, Mohammed A.; Palakal, MathewPancreatic cancer is one of the deadliest cancers, mostly diagnosed at late stages. Patients with pancreatic cysts are at higher risk of developing cancer and surveillance of these patients can help with early diagnosis. Much information about pancreatic cysts can be found in free text format in various medical narratives. In this retrospective study, a corpus of 1064 records from 44 patients at Indiana University Hospital from 1990 to 2012 was collected. A natural language processing system was developed and used to identify patients with pancreatic cysts. The input goes through series of tasks within the Unstructured Information Management Architecture (UIMA) framework consisting of report separation, metadata detection, sentence detection, concept annotation and writing into the database. Metadata such as medical record number (MRN), report id, report name, report date, report body were extracted from each report. Sentences were detected and concepts within each sentence were extracted using regular expression. Regular expression is a pattern of characters matching specific string of text. Our medical team assembled concepts that are used to identify pancreatic cysts in medical reports and additional keywords were added by searching through literature and Unified Medical Language System (UMLS) knowledge base. The Negex Algorithm was used to find out negation status of concepts. The 1064 reports were divided into sets of train and test sets. Two pancreatic-cyst surgeons created the gold standard data (Inter annotator agreement K=88%). The training set was analyzed to modify the regular expression. The concept identification using the NegEx algorithm resulted in precision and recall of 98.9% and 89% respectively. In order to improve the performance of negation detection, Stanford Dependency parser (SDP) was used. SDP finds out how words are related to each other in a sentence. SDP based negation algorithm improved the recall to 95.7%.