- Browse by Author
Browsing by Author "Mehrabi, Saeed"
Now showing 1 - 7 of 7
Results Per Page
Sort Options
Item Advanced natural language processing and temporal mining for clinical discovery(2015-08-17) Mehrabi, Saeed; Jones, Josette F.; Palakal, Mathew J.; Chien, Stanley Yung-Ping; Liu, Xiaowen; Schmidt, C. MaxThere has been vast and growing amount of healthcare data especially with the rapid adoption of electronic health records (EHRs) as a result of the HITECH act of 2009. It is estimated that around 80% of the clinical information resides in the unstructured narrative of an EHR. Recently, natural language processing (NLP) techniques have offered opportunities to extract information from unstructured clinical texts needed for various clinical applications. A popular method for enabling secondary uses of EHRs is information or concept extraction, a subtask of NLP that seeks to locate and classify elements within text based on the context. Extraction of clinical concepts without considering the context has many complications, including inaccurate diagnosis of patients and contamination of study cohorts. Identifying the negation status and whether a clinical concept belongs to patients or his family members are two of the challenges faced in context detection. A negation algorithm called Dependency Parser Negation (DEEPEN) has been developed in this research study by taking into account the dependency relationship between negation words and concepts within a sentence using the Stanford Dependency Parser. The study results demonstrate that DEEPEN, can reduce the number of incorrect negation assignment for patients with positive findings, and therefore improve the identification of patients with the target clinical findings in EHRs. Additionally, an NLP system consisting of section segmentation and relation discovery was developed to identify patients' family history. To assess the generalizability of the negation and family history algorithm, data from a different clinical institution was used in both algorithm evaluations.Item Automated pancreatic cyst screening using natural language processing: a new tool in the early detection of pancreatic cancer(Elsevier, 2015-05) Roch, Alexandra M.; Mehrabi, Saeed; Krishnan, Anand; Schmidt, Heidi E.; Kesterson, Joseph; Beesley, Chris; Dexter, Paul R.; Palakal, Matthew; Schmidt, C. Max; Department of Surgery, IU School of MedicineINTRODUCTION: As many as 3% of computed tomography (CT) scans detect pancreatic cysts. Because pancreatic cysts are incidental, ubiquitous and poorly understood, follow-up is often not performed. Pancreatic cysts may have a significant malignant potential and their identification represents a 'window of opportunity' for the early detection of pancreatic cancer. The purpose of this study was to implement an automated Natural Language Processing (NLP)-based pancreatic cyst identification system. METHOD: A multidisciplinary team was assembled. NLP-based identification algorithms were developed based on key words commonly used by physicians to describe pancreatic cysts and programmed for automated search of electronic medical records. A pilot study was conducted prospectively in a single institution. RESULTS: From March to September 2013, 566,233 reports belonging to 50,669 patients were analysed. The mean number of patients reported with a pancreatic cyst was 88/month (range 78-98). The mean sensitivity and specificity were 99.9% and 98.8%, respectively. CONCLUSION: NLP is an effective tool to automatically identify patients with pancreatic cysts based on electronic medical records (EMR). This highly accurate system can help capture patients 'at-risk' of pancreatic cancer in a registry.Item DEEPEN: A negation detection system for clinical text incorporating dependency relation into NegEx(Elsevier, 2015-04) Mehrabi, Saeed; Krishnan, Krishnan; Sohn, Sunghwan; Roch, Alexandra M; Schmidt, Heidi; Kesterson, Joe; Beesley, Chris; Dexter, Paul; Schmidt, C. Max; Liu, Hongfang; Palakal, Mathew; Surgery, School of MedicineIn Electronic Health Records (EHRs), much of valuable information regarding patients’ conditions is embedded in free text format. Natural language processing (NLP) techniques have been developed to extract clinical information from free text. One challenge faced in clinical NLP is that the meaning of clinical entities is heavily affected by modifiers such as negation. A negation detection algorithm, NegEx, applies a simplistic approach that has been shown to be powerful in clinical NLP. However, due to the failure to consider the contextual relationship between words within a sentence, NegEx fails to correctly capture the negation status of concepts in complex sentences. Incorrect negation assignment could cause inaccurate diagnosis of patients’ condition or contaminated study cohorts. We developed a negation algorithm called DEEPEN to decrease NegEx’s false positives by taking into account the dependency relationship between negation words and concepts within a sentence using Stanford dependency parser. The system was developed and tested using EHR data from Indiana University (IU) and it was further evaluated on Mayo Clinic dataset to assess its generalizability. The evaluation results demonstrate DEEPEN, which incorporates dependency parsing into NegEx, can reduce the number of incorrect negation assignment for patients with positive findings, and therefore improve the identification of patients with the target clinical findings in EHRs.Item IDENTIFICATION OF CAUSE AND EFFECT IN CAUSAL SENTENCES OF GERIATRIC CARE DOMAIN USING CONDITIONAL RANDOM(Office of the Vice Chancellor for Research, 2012-04-13) Mehrabi, Saeed; Krishnan, Anand; Palakal, MathewEvent extraction is a key step in many text mining applications. Identified events can be used in various applications such as question-answering systems, information extraction, summarization or building the knowledge base of a clinical decision support system. In this study we used PubMed abstracts of Geriatric care domain that were manually categorized into 42 different subdomains and further divided into causal and non-causal sentences by three domain experts. There are a total of 19,677 sentences in the collected abstracts from PubMed, out of which 2,856 sentences were selected and manually annotated with cause and effect events. We used conditional random fields (CRFs) that are statistical algorithms used to sequentially tag each word in a sentence as a cause or effect event based on some input variables or features. Features used in this study are words, words categories (lowercase, uppercase, mixed of letter and digits, etc.), affixes, part of speech and phrase chunks such as noun or verb phrase. For every word, a window of features before and after each word was also considered. We tested window of size, one to five meaning one to five features before and after each word was included as the input variables. The CRF algorithm was trained and tested on data set with 2,520 sentences in training set, 252 sentences in validation and 84 sentences in test set. Window of four features before and after each word had the best performance with 75.1% accuracy and F-measure of 85% with 84.6% precision and 87% recall.Item Identification of Patients with Family History of Pancreatic Cancer - Investigation of an NLP System Portability(IOS, 2015) Mehrabi, Saeed; Krishnan, Anand; Roch, Alexandra M.; Schmidt, Heidi; Li, DingCheng; Kesterson, Joe; Beesley, Chris; Dexter, Paul; Schmidt, Max; Palakal, Mathew; Liu, Hongfang; Department of BioHealth Informatics, School of Informatics and ComputingIn this study we have developed a rule-based natural language processing (NLP) system to identify patients with family history of pancreatic cancer. The algorithm was developed in a Unstructured Information Management Architecture (UIMA) framework and consisted of section segmentation, relation discovery, and negation detection. The system was evaluated on data from two institutions. The family history identification precision was consistent across the institutions shifting from 88.9% on Indiana University (IU) dataset to 87.8% on Mayo Clinic dataset. Customizing the algorithm on the the Mayo Clinic data, increased its precision to 88.1%. The family member relation discovery achieved precision, recall, and F-measure of 75.3%, 91.6% and 82.6% respectively. Negation detection resulted in precision of 99.1%. The results show that rule-based NLP approaches for specific information extraction tasks are portable across institutions; however customization of the algorithm on the new dataset improves its performance.Item Pancreatic Cancer Risk Stratification based on Patient Family History(Office of the Vice Chancellor for Research, 2013-04-05) Krishnan, Anand; Schmidt, C. Max; Roch, Alexandra M.; Beesley, Chris; Mehrabi, Saeed; Kesterson, Joe; Dexter, Paul; Al-Haddad, Mohammed A.; Palakal, MathewBackground: Pancreatic cancer is the fourth leading cause of cancer-related deaths in the US with an annual death rate approximating the incidence (38,460 and 45,220 respectively according to 2013 American Cancer Society). Due to delayed diagnosis, only 8% of patients are amenable to surgical resection, resulting in a 5-year survival rate of less than 6%. Screening the general population for pancreatic cancer is not feasible because of its low incidence (12.1 per 100,000 per year) and the lack of accurate screening tools. However, patients with an inherited predisposition to pancreatic cancer would benefit from selective screening. Methods: Clinical notes of patients from Indiana University (IU) Hospitals were used in this study. A Natural Language Processing (NLP) system based on the Unstructured Information Management Architecture framework was developed to process the family history data and extract pancreatic cancer information. This was performed through a series of NLP processes including report separation, section separation, sentence detection and keyword extraction. The family members and their corresponding diseases were extracted using regular expressions. The Stanford dependency parser was used to accurately link the family member and their diseases. Negation analysis was done using the NegEx algorithm. PancPro risk-prediction software was used to assess the lifetime risk scores of pancreatic cancer for each patient according to his/her family history. A decision tree was constructed based on these scores. Results: A corpus of 2000 reports of patients at IU Hospitals from 1990 to 2012 was collected. The family history section was present in 249 of these reports containing 463 sentences. The system was able to identify 222 reports (accuracy 87.5%) and 458 sentences (accuracy 91.36%). Conclusion: The family history risk score will be used for patients’ pancreatic cancer risk stratification, thus contributing to selective screening.Item Pancreatic Cysts Identification Using Unstructured Information Management Architecture(Office of the Vice Chancellor for Research, 2013-04-05) Mehrabi, Saeed; Schmidt, C. Max; Waters, Joshua A.; Beesley, Chris; Krishnan, Anand; Kesterson, Joe; Dexter, Paul; Al-Haddad, Mohammed A.; Palakal, MathewPancreatic cancer is one of the deadliest cancers, mostly diagnosed at late stages. Patients with pancreatic cysts are at higher risk of developing cancer and surveillance of these patients can help with early diagnosis. Much information about pancreatic cysts can be found in free text format in various medical narratives. In this retrospective study, a corpus of 1064 records from 44 patients at Indiana University Hospital from 1990 to 2012 was collected. A natural language processing system was developed and used to identify patients with pancreatic cysts. The input goes through series of tasks within the Unstructured Information Management Architecture (UIMA) framework consisting of report separation, metadata detection, sentence detection, concept annotation and writing into the database. Metadata such as medical record number (MRN), report id, report name, report date, report body were extracted from each report. Sentences were detected and concepts within each sentence were extracted using regular expression. Regular expression is a pattern of characters matching specific string of text. Our medical team assembled concepts that are used to identify pancreatic cysts in medical reports and additional keywords were added by searching through literature and Unified Medical Language System (UMLS) knowledge base. The Negex Algorithm was used to find out negation status of concepts. The 1064 reports were divided into sets of train and test sets. Two pancreatic-cyst surgeons created the gold standard data (Inter annotator agreement K=88%). The training set was analyzed to modify the regular expression. The concept identification using the NegEx algorithm resulted in precision and recall of 98.9% and 89% respectively. In order to improve the performance of negation detection, Stanford Dependency parser (SDP) was used. SDP finds out how words are related to each other in a sentence. SDP based negation algorithm improved the recall to 95.7%.