- Browse by Subject
Browsing by Subject "text mining"
Now showing 1 - 8 of 8
Results Per Page
Sort Options
Item Analyzing Patterns of Literature-Based Phenotyping Definitions for Text Mining Applications(IEEE, 2018-06) Binkheder, Samar; Wu, Heng-Yi; Quinney, Sara; Li, Lang; BioHealth Informatics, School of Informatics and ComputingPhenotyping definitions are widely used in observational studies that utilize population data from Electronic Health Records (EHRs). Biomedical text mining supports biomedical knowledge discovery. Therefore, we believe that mining phenotyping definitions from the literature can support EHR-based clinical research. However, information about these definitions presented in the literature is inconsistent, diverse, and unknown, especially for text mining usage. Therefore, we aim to analyze patterns of phenotyping definitions as a first step toward developing a text mining application to improve phenotype definition. A set random of observational studies was used for this analysis. Term frequency-inverse document frequency (TF-IDF) and Term Frequency (TF) were used to rank the terms in the 3958 sentences. Finally, we present preliminary results analyzing phenotyping definitions patterns.Item BlogSum: A Query-based Summarization Approach to Make Sense of Social Media(Office of the Vice Chancellor for Research, IUPUI, 2016-04-08) Mithun, ShamimaWith the rapid growth of the Social Web, a large amount of informal opinionated texts are available on numerous topics. However, people can be overwhelmed with this vast amount of information and they need help to find the information of their interests. Natural language tools for automatically analyzing these opinions become necessary to help individuals, organizations, and governments in making timely decisions. To address this need, I proposed a summarization approach for opinionated texts. To validate my approach, BlogSum is developed and evaluated experimentally using current benchmarks. Users can ask BlogSum any question (e.g. Why do people like Chrome better than Firefox?). To answer user's question, BlogSum first retrieves relevant blogs, reviews from the web then generates a concise summary that represents people opinions expressed towards the topic. Since blog summarization is a more recent endeavor, an error analysis was conducted by manually analyzing blog summaries to find there is any information processing difference needed for blogs compared to factual data. This analysis shows that question irrelevance and discourse incoherence, which decrease the overall quality of a summary and reduces the summary coherence, are two major issues for blog summaries. To address question irrelevance and discourse incoherence, in this work a domain-independent schema-based summarization approach is developed that utilizes discourse structures. This approach is based on the automatic identification of discourse relations within candidate sentences in order to instantiate the most appropriate discourse schema and filter and order candidate sentences in the most effective way. BlogSum also needs to deal with opinions, emotions effectively to be successful. BlogSum's overall performance as well as performance for question relevance and coherence was evaluated using various dataset. These results show that the proposed approach can effectively reduce question irrelevance and discourse incoherence and satisfy user's information need.Item Detecting substance-related problems in narrative investigation summaries of child abuse and neglect using text mining and machine learning(Elsevier, 2019-12) Perron, Brian E.; Victor, Bryan G.; Bushman, Gregory; Moore, Andrew; Ryan, Joseph P.; Lu, Alex Jiahong; Piellusch, Emily K.; School of Social WorkBackground State child welfare agencies collect, store, and manage vast amounts of data. However, they often do not have the right data, or the data is problematic or difficult to inform strategies to improve services and system processes. Considerable resources are required to read and code these text data. Data science and text mining offer potentially efficient and cost-effective strategies for maximizing the value of these data. Objective The current study tests the feasibility of using text mining for extracting information from unstructured text to better understand substance-related problems among families investigated for abuse or neglect. Method A state child welfare agency provided written summaries from investigations of child abuse and neglect. Expert human reviewers coded 2956 investigation summaries based on whether the caseworker observed a substance-related problem. These coded documents were used to develop, train, and validate computer models that could perform the coding on an automated basis. Results A set of computer models achieved greater than 90% accuracy when judged against expert human reviewers. Fleiss kappa estimates among computer models and expert human reviewers exceeded .80, indicating that expert human reviewer ratings are exchangeable with the computer models. Conclusion These results provide compelling evidence that text mining procedures can be a cost-effective and efficient solution for extracting meaningful insights from unstructured text data. Additional research is necessary to understand how to extract the actionable insights from these under-utilized stores of data in child welfare.Item Identification and Extraction of Binary, Ternary, Transitive associations and Frequent Patterns from Text Documents in an Interactive Way(Office of the Vice Chancellor for Research, 2013-04-05) Waranashiwar, Shruti DilipAs the amount of electronically accessible textual material has been growing exponentially, Text mining is a new and exciting research area that tries to solve the information overload problem. It is a promising and automated approach for extracting knowledge from unstructured textual documents. The purpose of this research in text mining area is to find compact but high quality associations from Neuroscience related text documents. Here, we try to find the relationships (binary, ternary and transitive) between the terms related to some of the common disorders in neuroscience like Alcoholism and Schizophrenia from a database PubMed, using Vector Space Model (VSM) and the Artificial Neural Network (ANN). We also use Graphviz to visualize these associations. This research reveals many stronger and weaker associations between the different terms in different comorbidities, which are otherwise difficult to understand by reading articles or journals manually. Once the model is developed, it can be generalized to different terms and can be used to study different combinations of terms and comorbidities. As response time of these models is very fast, it will greatly contribute towards speeding up medical research. In such light, extracting associations between keywords could provide very interesting insights into their roles in various diseases and other biological processes. We also try to prove that instead of mining all frequent patterns, all of which may not be interesting to user, interactive method to mine only desired and interesting patterns is far better approach in terms of utilization of resources. We find the compact but high-quality frequent patterns in an interactive way using MCMC sampling method. In interactive patterns mining, a user gives feedback on whether a pattern is interesting or not. The discovery of interesting Associations has application in many fields. Few of them are business decision-making processes, web usage mining, intrusion detection and bioinformatics.Item Redditors in Recovery: Text Mining Reddit to Investigate Transitions into Drug Addiction(IEEE, 2018-12) Lu, John; Sridhar, Sumati; Pandey, Ritika; Al Hasan, Mohammad; Mohler, George; Computer and Information Science, School of ScienceIncreasing rates of opioid drug abuse and heightened prevalence of online support communities underscore the necessity of employing data mining techniques to better understand drug addiction using these rapidly developing online resources. In this work, we obtain data from Reddit, an online collection of forums, to gather insight into drug use/misuse using text data from users themselves. Specifically, using user posts, we trained 1) a binary classifier which predicts transitions from casual drug discussion forums to drug recovery forums and 2) a Cox regression model that outputs likelihoods of such transitions. In doing so, we found that utterances of select drugs and certain linguistic features contained in one's posts can help predict these transitions. Using unfiltered drug-related posts, our research delineates drugs that are associated with higher rates of transitions from recreational drug discussion to support/recovery discussion, offers insight into modern drug culture, and provides tools with potential applications in combating the opioid crisis.Item Text Mining Online Discussions in an Introductory Physics Course(2018) Kelley, Patrick; Gavrin, Andrew; Lindell, Rebecca S.; Physics, School of ScienceWe implemented a social networking platform called Course Networking (CN) in IUPUI’s introductory calculus based mechanics course and recorded three semesters of online discussions. We used the Syuzhet package in R to evaluate sentiment in the recorded discussions, and to quantify the incidence of eight basic emotions: anger, anticipation, disgust, fear, joy, sadness, surprise, and trust. We applied this text mining method to over nine thousand posts and replies to identify and analyze student sentiment during three semesters. We also investigated the variation of these emotions throughout the semester, the role played by the most vocal students as compared to the least frequent posters, and gender differences. With an abundance of students’ online discussions, text mining offers an expedient and automated means of analysis, providing a new window into students thinking and emotional state during semester-long physics coursesItem The bioinformatics toolbox for circRNA discovery and analysis(Oxford University Press, 2021) Chen, Liang; Wang, Changliang; Sun, Huiyan; Wang, Juexin; Lian, Yanchun; Wang, Yan; Wong, Garry; Biomedical Engineering and Informatics, Luddy School of Informatics, Computing, and EngineeringCircular RNAs (circRNAs) are a unique class of RNA molecule identified more than 40 years ago which are produced by a covalent linkage via back-splicing of linear RNA. Recent advances in sequencing technologies and bioinformatics tools have led directly to an ever-expanding field of types and biological functions of circRNAs. In parallel with technological developments, practical applications of circRNAs have arisen including their utilization as biomarkers of human disease. Currently, circRNA-associated bioinformatics tools can support projects including circRNA annotation, circRNA identification and network analysis of competing endogenous RNA (ceRNA). In this review, we collected about 100 circRNA-associated bioinformatics tools and summarized their current attributes and capabilities. We also performed network analysis and text mining on circRNA tool publications in order to reveal trends in their ongoing development.Item Visualizing Social Science Research in an Institutional Repository(2015-06-03) Polley, David E.Using text mining and visualization techniques to identify the topical coverage of text corpora is increasingly common in a number of disciplines. When these approaches are applied to the titles and abstracts of articles published in an academic journal, it yields insight into the evolution of scholarly content in the journal. Similarly, text mining and visualization can reveal the topical coverage of items archived in an institutional repository. This poster will present initial results from mining the text and visualizing the abstracts of social science research in one university’s institutional repository. Generating a topic map visually demonstrates how research in a repository clusters around specific domains in the social sciences. These topic maps are potentially useful to librarians and researchers seeking to learn more about the topical coverage of items in their repository and determine if the research is reflective of the scholarly output from an institution more broadly.