- Browse by Subject
Browsing by Subject "Natural Language Processing (NLP)"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
Item Automatic Extraction of Computer Science Concept Phrases Using a Hybrid Machine Learning Paradigm(2023-05) Jahin, S M Abrar; Al Hasan, Mohammad; Fang, Shiaofen; Mukhopadhyay, SnehasisWith the proliferation of computer science in recent years in modern society, the number of computer science-related employment is expanding quickly. Software engineer has been chosen as the best job for 2023 based on pay, stress level, opportunity for professional growth, and balance between work and personal life. This was decided by a rankings of different news, journals, and publications. Computer science occupations are anticipated to be in high demand not just in 2023, but also for the foreseeable future. It's not surprising that the number of computer science students at universities is growing and will continue to grow. The enormous increase in student enrolment in many subdisciplines of computers has presented some distinct issues. If computer science is to be incorporated into the K-12 curriculum, it is vital that K-12 educators are competent. But one of the biggest problems with this plan is that there aren't enough trained computer science professors. Numerous new fields and applications, for instance, are being introduced to computer science. In addition, it is difficult for schools to recruit skilled computer science instructors for a variety of reasons including low salary issue. Utilizing the K-12 teachers who are already in the schools, have a love for teaching, and consider teaching as a vocation is therefore the most effective strategy to improve or fix this issue. So, if we want teachers to quickly grasp computer science topics, we need to give them an easy way to learn about computer science. To simplify and expedite the study of computer science, we must acquaint school-treachers with the terminology associated with computer science concepts so they can know which things they need to learn according to their profile. If we want to make it easier for schoolteachers to comprehend computer science concepts, it would be ideal if we could provide them with a tree of words and phrases from which they could determine where the phrases originated and which phrases are connected to them so that they can be effectively learned. To find a good concept word or phrase, we must first identify concepts and then establish their connections or linkages. As computer science is a fast developing field, its nomenclature is also expanding at a frenetic rate. Therefore, adding all concepts and terms to the knowledge graph would be a challenging endeavor. Cre- ating a system that automatically adds all computer science domain terms to the knowledge graph would be a straightforward solution to the issue. We have identified knowledge graph use cases for the schoolteacher training program, which motivates the development of a knowledge graph. We have analyzed the knowledge graph's use case and the knowledge graph's ideal characteristics. We have designed a webbased system for adding, editing, and removing words from a knowledge graph. In addition, a term or phrase can be represented with its children list, parent list, and synonym list for enhanced comprehension. We' ve developed an automated system for extracting words and phrases that can extract computer science idea phrases from any supplied text, therefore enriching the knowledge graph. Therefore, we have designed the knowledge graph for use in teacher education so that school-teachers can educate K-12 students computer science topicses effectively.Item Improving the Performance of Clinical Prediction Tasks by Using Structured and Unstructured Data Combined with a Patient Network(2021-08) Nouri Golmaei, Sara; Luo, Xiao; King, Brian; Zhang, QingxueWith the increasing availability of Electronic Health Records (EHRs) and advances in deep learning techniques, developing deep predictive models that use EHR data to solve healthcare problems has gained momentum in recent years. The majority of clinical predictive models benefit from structured data in EHR (e.g., lab measurements and medications). Still, learning clinical outcomes from all possible information sources is one of the main challenges when building predictive models. This work focuses mainly on two sources of information that have been underused by researchers; unstructured data (e.g., clinical notes) and a patient network. We propose a novel hybrid deep learning model, DeepNote-GNN, that integrates clinical notes information and patient network topological structure to improve 30-day hospital readmission prediction. DeepNote-GNN is a robust deep learning framework consisting of two modules: DeepNote and patient network. DeepNote extracts deep representations of clinical notes using a feature aggregation unit on top of a state-of-the-art Natural Language Processing (NLP) technique - BERT. By exploiting these deep representations, a patient network is built, and Graph Neural Network (GNN) is used to train the network for hospital readmission predictions. Performance evaluation on the MIMIC-III dataset demonstrates that DeepNote-GNN achieves superior results compared to the state-of-the-art baselines on the 30-day hospital readmission task. We extensively analyze the DeepNote-GNN model to illustrate the effectiveness and contribution of each component of it. The model analysis shows that patient network has a significant contribution to the overall performance, and DeepNote-GNN is robust and can consistently perform well on the 30-day readmission prediction task. To evaluate the generalization of DeepNote and patient network modules on new prediction tasks, we create a multimodal model and train it on structured and unstructured data of MIMIC-III dataset to predict patient mortality and Length of Stay (LOS). Our proposed multimodal model consists of four components: DeepNote, patient network, DeepTemporal, and score aggregation. While DeepNote keeps its functionality and extracts representations of clinical notes, we build a DeepTemporal module using a fully connected layer stacked on top of a one-layer Gated Recurrent Unit (GRU) to extract the deep representations of temporal signals. Independent to DeepTemporal, we extract feature vectors of temporal signals and use them to build a patient network. Finally, the DeepNote, DeepTemporal, and patient network scores are linearly aggregated to fit the multimodal model on downstream prediction tasks. Our results are very competitive to the baseline model. The multimodal model analysis reveals that unstructured text data better help to estimate predictions than temporal signals. Moreover, there is no limitation in applying a patient network on structured data. In comparison to other modules, the patient network makes a more significant contribution to prediction tasks. We believe that our efforts in this work have opened up a new study area that can be used to enhance the performance of clinical predictive models.Item Multi-label natural language processing to identify diagnosis and procedure codes from MIMIC-III inpatient notes(arXiv, 2020) Bhavani Singh, A. K.; Guntu, Mounika; Bhimireddy, Ananth Reddy; Gichoya, Judy W.; Purkayastha, Saptarshi; BioHealth Informatics, School of Informatics and ComputingIn the United States, 25% or greater than 200 billion dollars of hospital spending accounts for administrative costs that involve services for medical coding and billing. With the increasing number of patient records, manual assignment of the codes performed is overwhelming, time-consuming and error-prone, causing billing errors. Natural language processing can automate the extraction of codes/labels from unstructured clinical notes, which can aid human coders to save time, increase productivity, and verify medical coding errors. Our objective is to identify appropriate diagnosis and procedure codes from clinical notes by performing multi-label classification. We used de-identified data of critical care patients from the MIMIC-III database and subset the data to select the ten (top-10) and fifty (top-50) most common diagnoses and procedures, which covers 47.45% and 74.12% of all admissions respectively. We implemented state-of-the-art Bidirectional Encoder Representations from Transformers (BERT) to fine-tune the language model on 80% of the data and validated on the remaining 20%. The model achieved an overall accuracy of 87.08%, an F1 score of 85.82%, and an AUC of 91.76% for top-10 codes. For the top-50 codes, our model achieved an overall accuracy of 93.76%, an F1 score of 92.24%, and AUC of 91%. When compared to previously published research, our model outperforms in predicting codes from the clinical text. We discuss approaches to generalize the knowledge discovery process of our MIMIC-BERT to other clinical notes. This can help human coders to save time, prevent backlogs, and additional costs due to coding errors.Item Predictive Coding Systems for Electronic Discovery(2016-10-14) Soundarajan, Dhivya; Hook, Sara AnneOver the past year, the presenters have been working to design a simple predictive coding system for electronic discovery (e-discovery) based on readily-available software and Natural Language Processing. In this presentation, they cover the history of e-discovery in the U.S., the evolution and increasing acceptance of predictive coding/Technology-Assisted Review (TAR) as part of an e-discovery process, considerations, including software choices, in constructing a predictive coding system and a demonstration of the prototype that Ms. Soundarajan developed. They will discuss their future work, which will include usability testing of the system with a focus group of lawyers who are responsible for e-discovery, the features and functionality that they would like to add to the system and the larger set of materials they would like to experiment with to further refine their system's capabilities.Item SEMANTIC MAPPING OF STEM CONTENTS FOR AURAL REPRESENTATION USING LITERATURE MINING(Office of the Vice Chancellor for Research, 2012-04-13) Bharadwaj, Venkatesh; Palakal, Mathew; Mannheimer, StevenAs STEM education increasingly relies on illustrations, animations and video to communicate complex concepts, blind and visually impaired (BVI) students are increasingly left behind. However, tablet computers and other digital technologies offer the potential for a sound-based solution that leverages the ability of BVI students to “think aurally” beyond simple spoken terminology. Previous work has shown that non-verbal sound can improve educational outcomes for BVI students. The challenge is translating science concepts that may be essentially soundless (e.g. photosynthesis or cumulus clouds) into sounds that communicate the component ideas of a concept. One key is to consider any science concept as a process or activity with actions and actors, and to identify sounds that refer to them. Our research focuses on computational strategies for analyzing the sentences used in standard K-12 textbooks to define or describe any given science concept-activity, and generate a semantic sequence of words which correlates to sounds that can best portray or embody them. This is done with the help of Natural Language Processing (NLP) tools in combination with a newly developed Information Extraction (IE) algorithm. Because each word in a semantic sequence can potentially correlate to multiple sounds, it is necessary to find a dynamic path connecting the list of sounds that represent a word sequence in the context of the given science process or categorical domain. For example, there are multiple sounds associated with the basic concept “water:” e.g. splashing, pouring, drops dripping. But in the context of “precipitation” dripping is most relevant. The algorithm to identify the best concept-to-sound correlations is a newly developed, self-learning and adaptive algorithm. This research supports, and is informed by, experiments in aural pedagogy conducted at Indiana School of Blind and Visually Impaired. Our long-term goal is the generation of a language of non-verbal sounds.