IU Indianapolis ScholarWorks :: Browsing by Subject "NLP"

Browsing by Subject "NLP"

Now showing 1 - 6 of 6

Analyzing and evaluating security features in software requirements
(2016-10-28) Hayrapetian, Allenoush; Raje, Rajeev
Software requirements, for complex projects, often contain specifications of non-functional attributes (e.g., security-related features). The process of analyzing such requirements for standards compliance is laborious and error prone. Due to the inherent free-flowing nature of software requirements, it is tempting to apply Natural Language Processing (NLP) and Machine Learning (ML) based techniques for analyzing these documents. In this thesis, we propose a novel semi-automatic methodology that assesses the security requirements of the software system with respect to completeness and ambiguity, creating a bridge between the requirements documents and being in compliance. Security standards, e.g., those introduced by the ISO and OWASP, are compared against annotated software project documents for textual entailment relationships (NLP), and the results are used to train a neural network model (ML) for classifying security-based requirements. Hence, this approach aims to identify the appropriate structures that underlie software requirements documents. Once such structures are formalized and empirically validated, they will provide guidelines to software organizations for generating comprehensive and unambiguous requirements specification documents as related to security-oriented features. The proposed solution will assist organizations during the early phases of developing secure software and reduce overall development effort and costs.
Attention Mechanism with BERT for Content Annotation and Categorization of Pregnancy-Related Questions on a Community Q&A Site
(IEEE, 2020-12) Luo, Xiao; Ding, Haoran; Tang, Matthew; Gandhi, Priyanka; Zhang, Zhan; He, Zhe; Engineering Technology, School of Engineering and Technology
In recent years, the social web has been increasingly used for health information seeking, sharing, and subsequent health-related research. Women often use the Internet or social networking sites to seek information related to pregnancy in different stages. They may ask questions about birth control, trying to conceive, labor, or taking care of a newborn or baby. Classifying different types of questions about pregnancy information (e.g., before, during, and after pregnancy) can inform the design of social media and professional websites for pregnancy education and support. This research aims to investigate the attention mechanism built-in or added on top of the BERT model in classifying and annotating the pregnancy-related questions posted on a community Q&A site. We evaluated two BERT-based models and compared them against the traditional machine learning models for question classification. Most importantly, we investigated two attention mechanisms: the built-in self-attention mechanism of BERT and the additional attention layer on top of BERT for relevant term annotation. The classification performance showed that the BERT-based models worked better than the traditional models, and BERT with an additional attention layer can achieve higher overall precision than the basic BERT model. The results also showed that both attention mechanisms work differently on annotating relevant content, and they could serve as feature selection methods for text mining in general.
Natural Language Processing and Extracting Information From Medical Reports
(2006-06-29T19:24:21Z) Pfeiffer II, Richard D.; McDaniel, Anna M.
The purpose of this study is to examine the current use of natural language processing for extracting meaningful data from free text in medical reports. The use of natural language processing has been used to process information from various genres. To evaluate the use of natural language processing, a synthesized review of primary research papers specific to natural language processing and extracting data from medical reports. A three phased approach is used to describe the process of gathering the final metrics for validating the use of natural language processing. The main purpose of any NLP is to extract or understand human language and to process it into meaning for a specified area of interest or end-user. There are three types of approaches: symbolic, statistical, and connectionist. There are identified problems with natural language processing and the different approaches. Problems noted about natural language processing in the research are: acquisition, coverage, robustness, and extensibility. Metrics were gathered from primary research papers to evaluate the success of the natural language processors. Recall average of the four papers was 85%. Precision average of five papers was 87.7%. Accuracy average was 97%. Sensitivity average was 84%, while specificity was 97.4%. Based on the results of the primary research there was no definitive way to validate one NLP approach as an industry standard The research reviewed it is clear that there has been at least limited success with information extraction from free text with use of natural language processing. It is important to understand the continuum of data, information, and knowledge in the previous and future research of natural language processing. In the industry of health informatics this is a technology necessary for improving healthcare and research.
A Security Related and Evidence-Based Holistic Ranking and Composition Framework for Distributed Services
(2021-05) Chowdhury, Nahida Sultana; Raje, Rajeev R.; Tuceryan, Mihran; Hill, James; Xia, Yuni
The number of smart mobile devices has grown at a significant rate in recent years. This growth has resulted in an exponential number of publicly available mobile Apps. To help the selection of suitable Apps, from various offered choices, the App distribution platforms generally rank/recommend Apps based on average star ratings, the number of installs, and associated reviews ― all the external factors of an App. However, these ranking schemes typically tend to ignore critical internal factors (e.g., bugs, security vulnerabilities, and data leaks) of the Apps. The AppStores need to incorporate a holistic methodology that includes internal and external factors to assign a level of trust to Apps. The inclusion of the internal factors will describe associated potential security risks. This issue is even more crucial with newly available Apps, for which either user reviews are sparse, or the number of installs is still insignificant. In such a scenario, users may fail to estimate the potential risks associated with installing Apps that exist in an AppStore. This dissertation proposes a security-related and evidence-based ranking framework, called SERS (Security-related and Evidence-based Ranking Scheme) to compare similar Apps. The trust associated with an App is calculated using both internal and external factors (i.e., security flaws and user reviews) following an evidence-based approach and applying subjective logic principles. The SERS is formalized and further enhanced in the second part of this dissertation, resulting in its enhanced version, called as E-SERS (Enhanced SERS). These enhancements include an ability to integrate any number of sources that can generate evidence for an App and consider the temporal aspect and reputation of evidence sources. Both SERS and E-SERS are evaluated using publicly accessible Apps from the Google PlayStore and the rankings generated by them are compared with prevalent ranking techniques such as the average star ratings and the Google PlayStore Rankings. The experimental results indicate that E-SERS provides a comprehensive and holistic view of an App when compared with prevalent alternatives. E-SERS is also successful in identifying malicious Apps where other ranking schemes failed to address such vulnerabilities. In the third part of this dissertation, the E-SERS framework is used to propose a trust-aware composition model at two different granularities. This model uses the trust score computed by E-SERS, along with the probability of an App belonging to the malicious category, as the desired attributes for selecting a composition as the two granularities. Finally, the trust-aware composition model is evaluated with the average star rating parameter and the trust score. A holistic approach, as proposed by E-SERS, to computer a trust score will benefit all kinds of Apps including newly published Apps that follow proper security measures but initially struggle in the AppStore rankings due to a lack of a large number of reviews and ratings. Hence, E-SERS will be helpful both to the developers and users. In addition, the composition model that uses such a holistic trust score will enable system integrators to create trust-aware distributed systems for their specific needs.
A Study of Transformer Models for Emotion Classification in Informal Text
(2021-12) Esperanca, Alvaro Soares de Boa; King, Brian; Luo, Xiao; Ding, Zhenming
Textual emotion classification is a task in affective AI that branches from sentiment analysis and focuses on identifying emotions expressed in a given text excerpt. It has a wide variety of applications that improve human-computer interactions, particularly to empower computers to understand subjective human language better. Significant research has been done on this task, but very little of that research leverages one of the most emotion-bearing symbols we have used in modern communication: Emojis. In this thesis, we propose several transformer-based models for emotion classification that processes emojis as input tokens and leverages pretrained models and uses them , a model that processes Emojis as textual inputs and leverages DeepMoji to generate affective feature vectors used as reference when aggregating different modalities of text encoding. To evaluate ReferEmo, we experimented on the SemEval 2018 and GoEmotions datasets, two benchmark datasets for emotion classification, and achieved competitive performance compared to state-of-the-art models tested on these datasets. Notably, our model performs better on the underrepresented classes of each dataset.
T3-Vis: a visual analytic framework for Training and fine-Tuning Transformers in NLP
(ACL Anthology, 2021) Li, Raymond; Xiao, Wen; Wang, Lanjun; Jang, Hyeju; Carenini, Giuseppe; Computer Science, Luddy School of Informatics, Computing, and Engineering
Transformers are the dominant architecture in NLP, but their training and fine-tuning is still very challenging. In this paper, we present the design and implementation of a visual analytic framework for assisting researchers in such process, by providing them with valuable insights about the model’s intrinsic properties and behaviours. Our framework offers an intuitive overview that allows the user to explore different facets of the model (e.g., hidden states, attention) through interactive visualization, and allows a suite of built-in algorithms that compute the importance of model components and different parts of the input sequence. Case studies and feedback from a user focus group indicate that the framework is useful, and suggest several improvements. Our framework is available at: https://github.com/raymondzmc/T3-Vis.

Browsing by Subject "NLP"

Results Per Page

Sort Options