IU Indianapolis ScholarWorks :: Browsing by Subject "Text Mining"

Browsing by Subject "Text Mining"

Now showing 1 - 5 of 5

Analyzing and evaluating security features in software requirements
(2016-10-28) Hayrapetian, Allenoush; Raje, Rajeev
Software requirements, for complex projects, often contain specifications of non-functional attributes (e.g., security-related features). The process of analyzing such requirements for standards compliance is laborious and error prone. Due to the inherent free-flowing nature of software requirements, it is tempting to apply Natural Language Processing (NLP) and Machine Learning (ML) based techniques for analyzing these documents. In this thesis, we propose a novel semi-automatic methodology that assesses the security requirements of the software system with respect to completeness and ambiguity, creating a bridge between the requirements documents and being in compliance. Security standards, e.g., those introduced by the ISO and OWASP, are compared against annotated software project documents for textual entailment relationships (NLP), and the results are used to train a neural network model (ML) for classifying security-based requirements. Hence, this approach aims to identify the appropriate structures that underlie software requirements documents. Once such structures are formalized and empirically validated, they will provide guidelines to software organizations for generating comprehensive and unambiguous requirements specification documents as related to security-oriented features. The proposed solution will assist organizations during the early phases of developing secure software and reduce overall development effort and costs.
Bridging Text Mining and Bayesian Networks
(2011-03-09) Raghuram, Sandeep Mudabail; Xia, Yuni; Palakal, Mathew; Zou, Xukai, 1963-
After the initial network is constructed using expert’s knowledge of the domain, Bayesian networks need to be updated as and when new data is observed. Literature mining is a very important source of this new data. In this work, we explore what kind of data needs to be extracted with the view to update Bayesian Networks, existing technologies which can be useful in achieving some of the goals and what research is required to accomplish the remaining requirements. This thesis specifically deals with utilizing causal associations and experimental results which can be obtained from literature mining. However, these associations and numerical results cannot be directly integrated with the Bayesian network. The source of the literature and the perceived quality of research needs to be factored into the process of integration, just like a human, reading the literature, would. This thesis presents a general methodology for updating a Bayesian Network with the mined data. This methodology consists of solutions to some of the issues surrounding the task of integrating the causal associations with the Bayesian Network and demonstrates the idea with a semiautomated software system.
Interactive pattern mining of neuroscience data
(2014-01-29) Waranashiwar, Shruti Dilip; Mukhopadhyay, Snehasis; Durresi, Arjan; Xia, Yuni
Text mining is a process of extraction of knowledge from unstructured text documents. We have huge volumes of text documents in digital form. It is impossible to manually extract knowledge from these vast texts. Hence, text mining is used to find useful information from text through the identification and exploration of interesting patterns. The objective of this thesis in text mining area is to find compact but high quality frequent patterns from text documents related to neuroscience field. We try to prove that interactive sampling algorithm is efficient in terms of time when compared with exhaustive methods like FP Growth using RapidMiner tool. Instead of mining all frequent patterns, all of which may not be interesting to user, interactive method to mine only desired and interesting patterns is far better approach in terms of utilization of resources. This is especially observed with large number of keywords. In interactive patterns mining, a user gives feedback on whether a pattern is interesting or not. Using Markov Chain Monte Carlo (MCMC) sampling method, frequent patterns are generated in an interactive way. Thesis discusses extraction of patterns between the keywords related to some of the common disorders in neuroscience in an interactive way. PubMed database and keywords related to schizophrenia and alcoholism are used as inputs. This thesis reveals many associations between the different terms, which are otherwise difficult to understand by reading articles or journals manually. Graphviz tool is used to visualize associations.
Mining the Indianapolis Recorder: An Exploratory Study of a Digital Humanities Dataset
(2014-04-11) Polley, David E.; Coates, Heather L.; Johnson, Jennifer; Odell, Jere D.; Palmer, Kristi L.
This poster presents an initial study using full-text transcripts from the Indianapolis Recorder, one of the nation's most important African American newspapers. Basic text mining and visualization approaches are used to highlight this data set and its potential for use in the digital humanities.
TEXT MINER FOR HYPERGRAPHS USING OUTPUT SPACE SAMPLING
(2011-08-16) Tirupattur, Naveen; Mukhopadhyay, Snehasis; Fang, Shiaofen; Xia, Yuni
Text Mining is process of extracting high-quality knowledge from analysis of textual data. Rapidly growing interest and focus on research in many fields is resulting in an overwhelming amount of research literature. This literature is a vast source of knowledge. But due to huge volume of literature, it is practically impossible for researchers to manually extract the knowledge. Hence, there is a need for automated approach to extract knowledge from unstructured data. Text mining is right approach for automated extraction of knowledge from textual data. The objective of this thesis is to mine documents pertaining to research literature, to find novel associations among entities appearing in that literature using Incremental Mining. Traditional text mining approaches provide binary associations. But it is important to understand context in which these associations occur. For example entity A has association with entity B in context of entity C. These contexts can be visualized as multi-way associations among the entities which are represented by a Hypergraph. This thesis work talks about extracting such multi-way associations among the entities using Frequent Itemset Mining and application of a new concept called Output space sampling to extract such multi-way associations in space and time efficient manner. We incorporated concept of personalization in Output space sampling so that user can specify his/her interests as the frequent hyper-associations are extracted from the text.

Browsing by Subject "Text Mining"

Results Per Page

Sort Options