- Browse by Author
Browsing by Author "Xia, Yuni"
Now showing 1 - 10 of 33
Results Per Page
Sort Options
Item Aural Mapping of STEM Concepts Using Literature Mining(2013-03-06) Bharadwaj, Venkatesh; Palakal, Mathew J.; Raje, Rajeev; Xia, YuniRecent technological applications have made the life of people too much dependent on Science, Technology, Engineering, and Mathematics (STEM) and its applications. Understanding basic level science is a must in order to use and contribute to this technological revolution. Science education in middle and high school levels however depends heavily on visual representations such as models, diagrams, figures, animations and presentations etc. This leaves visually impaired students with very few options to learn science and secure a career in STEM related areas. Recent experiments have shown that small aural clues called Audemes are helpful in understanding and memorization of science concepts among visually impaired students. Audemes are non-verbal sound translations of a science concept. In order to facilitate science concepts as Audemes, for visually impaired students, this thesis presents an automatic system for audeme generation from STEM textbooks. This thesis describes the systematic application of multiple Natural Language Processing tools and techniques, such as dependency parser, POS tagger, Information Retrieval algorithm, Semantic mapping of aural words, machine learning etc., to transform the science concept into a combination of atomic-sounds, thus forming an audeme. We present a rule based classification method for all STEM related concepts. This work also presents a novel way of mapping and extracting most related sounds for the words being used in textbook. Additionally, machine learning methods are used in the system to guarantee the customization of output according to a user's perception. The system being presented is robust, scalable, fully automatic and dynamically adaptable for audeme generation.Item Automated image classification via unsupervised feature learning by K-means(2015-07-09) Karimy Dehkordy, Hossein; Dundar, Mehmet Murat; Song, Fengguang; Xia, YuniResearch on image classification has grown rapidly in the field of machine learning. Many methods have already been implemented for image classification. Among all these methods, best results have been reported by neural network-based techniques. One of the most important steps in automated image classification is feature extraction. Feature extraction includes two parts: feature construction and feature selection. Many methods for feature extraction exist, but the best ones are related to deep-learning approaches such as network-in-network or deep convolutional network algorithms. Deep learning tries to focus on the level of abstraction and find higher levels of abstraction from the previous level by having multiple layers of hidden layers. The two main problems with using deep-learning approaches are the speed and the number of parameters that should be configured. Small changes or poor selection of parameters can alter the results completely or even make them worse. Tuning these parameters is usually impossible for normal users who do not have super computers because one should run the algorithm and try to tune the parameters according to the results obtained. Thus, this process can be very time consuming. This thesis attempts to address the speed and configuration issues found with traditional deep-network approaches. Some of the traditional methods of unsupervised learning are used to build an automated image-classification approach that takes less time both to configure and to run.Item Biomedical Literature Mining with Transitive Closure and Maximum Network Flow(http://doi.acm.org/10.1145/1851476.1851552, 2011-05-15) Hoblitzell, Andrew P.; Mukhopadhyay, Snehasis; Xia, Yuni; Fang, ShiafoenThe biological literature is a huge and constantly increasing source of information which the biologist may consult for information about their field, but the vast amount of data can sometimes become overwhelming. Medline, which makes a great amount of biological journal data available online, makes the development of automated text mining systems and hence “data-driven discovery” possible. This thesis examines current work in the field of text mining and biological literature, and then aims to mine documents pertaining to bone biology. The documents are retrieved from PubMed, and then direct associations between the terms are computers. Potentially novel transitive associations among biological objects are then discovered using the transitive closure algorithm and the maximum flow algorithm. The thesis discusses in detail the extraction of biological objects from the collected documents and the co-occurrence based text mining algorithm, the transitive closure algorithm, and the maximum network flow which were then run to extract the potentially novel biological associations. Generated hypotheses (novel associations) were assigned with significance scores for further validation by a bone biologist expert. Extension of the work in to hypergraphs for enhanced meaning and accuracy is also examined in the thesis.Item Bridging Text Mining and Bayesian Networks(2011-03-09) Raghuram, Sandeep Mudabail; Xia, Yuni; Palakal, Mathew; Zou, Xukai, 1963-After the initial network is constructed using expert’s knowledge of the domain, Bayesian networks need to be updated as and when new data is observed. Literature mining is a very important source of this new data. In this work, we explore what kind of data needs to be extracted with the view to update Bayesian Networks, existing technologies which can be useful in achieving some of the goals and what research is required to accomplish the remaining requirements. This thesis specifically deals with utilizing causal associations and experimental results which can be obtained from literature mining. However, these associations and numerical results cannot be directly integrated with the Bayesian network. The source of the literature and the perceived quality of research needs to be factored into the process of integration, just like a human, reading the literature, would. This thesis presents a general methodology for updating a Bayesian Network with the mined data. This methodology consists of solutions to some of the issues surrounding the task of integrating the causal associations with the Bayesian Network and demonstrates the idea with a semiautomated software system.Item CyberWater: An Open Framework for Data and Model Integration(2024-05) Chen, Ranran; Liang, Yao; Song, Fengguang; Xia, Yuni; Zheng, JiangyuWorkflow management systems (WMSs) are commonly used to organize/automate sequences of tasks as workflows to accelerate scientific discoveries. During complex workflow modeling, a local interactive workflow environment is desirable, as users usually rely on their rich, local environments for fast prototyping and refinements before they consider using more powerful computing resources. This dissertation delves into the innovative development of the CyberWater framework based on Workflow Management Systems (WMSs). Against the backdrop of data-intensive and complex models, CyberWater exemplifies the transition of intricate data into insightful and actionable knowledge and introduces the nuanced architecture of CyberWater, particularly focusing on its adaptation and enhancement from the VisTrails system. It highlights the significance of control and data flow mechanisms and the introduction of new data formats for effective data processing within the CyberWater framework. This study presents an in-depth analysis of the design and implementation of Generic Model Agent Toolkits. The discussion centers on template-based component mechanisms and the integration with popular platforms, while emphasizing the toolkits ability to facilitate on-demand access to High-Performance Computing resources for large-scale data handling. Besides, the development of an asynchronously controlled workflow within CyberWater is also explored. This innovative approach enhances computational performance by optimizing pipeline-level parallelism and allows for on-demand submissions of HPC jobs, significantly improving the efficiency of data processing. A comprehensive methodology for model-driven development and Python code integration within the CyberWater framework and innovative applications of GPT models for automated data retrieval are introduced in this research as well. It examines the implementation of Git Actions for system automation in data retrieval processes and discusses the transformation of raw data into a compatible format, enhancing the adaptability and reliability of the data retrieval component in the adaptive generic model agent toolkit component. For the development and maintenance of software within the CyberWater framework, the use of tools like GitHub for version control and outlining automated processes has been applied for software updates and error reporting. Except that, the user data collection also emphasizes the role of the CyberWater Server in these processes. In conclusion, this dissertation presents our comprehensive work on the CyberWater framework’s advancements, setting new standards in scientific workflow management and demonstrating how technological innovation can significantly elevate the process of scientific discovery.Item DCMS: A data analytics and management system for molecular simulation(SpringerOpen, 2014-11-26) Kumar, Anand; Grupcev, Vladimir; Berrada, Meryem; Fogarty, Joseph C.; Tu, Yi-Cheng; Zhu, Xingquan; Pandit, Sagar A.; Xia, Yuni; Department of Computer and Information Science, School of ScienceMolecular Simulation (MS) is a powerful tool for studying physical/chemical features of large systems and has seen applications in many scientific and engineering domains. During the simulation process, the experiments generate a very large number of atoms and intend to observe their spatial and temporal relationships for scientific analysis. The sheer data volumes and their intensive interactions impose significant challenges for data accessing, managing, and analysis. To date, existing MS software systems fall short on storage and handling of MS data, mainly because of the missing of a platform to support applications that involve intensive data access and analytical process. In this paper, we present the database-centric molecular simulation (DCMS) system our team developed in the past few years. The main idea behind DCMS is to store MS data in a relational database management system (DBMS) to take advantage of the declarative query interface (i.e., SQL), data access methods, query processing, and optimization mechanisms of modern DBMSs. A unique challenge is to handle the analytical queries that are often compute-intensive. For that, we developed novel indexing and query processing strategies (including algorithms running on modern co-processors) as integrated components of the DBMS. As a result, researchers can upload and analyze their data using efficient functions implemented inside the DBMS. Index structures are generated to store analysis results that may be interesting to other users, so that the results are readily available without duplicating the analysis. We have developed a prototype of DCMS based on the PostgreSQL system and experiments using real MS data and workload show that DCMS significantly outperforms existing MS software systems. We also used it as a platform to test other data management issues such as security and compression.Item Decision Support System For Geriatric Care(Office of the Vice Chancellor for Research, 2010-04-09) Palakal, Mathew; Pandit, Yogesh; Jones, Josette; Xia, Yuni; Bandos, Jean; Geesaman, Jerry; Pecenka, Dave; Tinsley, EricGeriatrics is a branch in medicine that focuses on the healthcare of the elderly. We propose to build a decision support system for the elderly care based on a knowledgebase system that incorporates best practices that are reported in the literature. A Bayesian network model is then used for decision support for the geriatric care tool that we develop.Item Decision Support System for Geriatric CarePandit, Yogesh; Palakal, Mathew J.; Jones, Josette; Xia, Yuni; Pecenka, Dave; Bandos, Jean; Tinsley, Eric; Geesaman, JerryGeriatrics is a branch in medicine that focuses on the healthcare of the elderly. It is a field that promotes health and aims towards preventing and treating diseases and disabilities in the older people. Geriatric interventions are published in many different articles and journals.Item Design and Implementation of Web-based Data and Network Management System for Heterogeneous Wireless Sensor Networks(2011-03-09) Yu, Qun; Liang, Yao; Zou, Xukai; Xia, YuniToday, Wireless Sensor Networks (WSNs) are forming an exciting new area to have dramatic impacts on science and engineering innovations. New WSN-based technologies, such as body sensor networks in medical and health care and environmental monitoring sensor networks, are emerging. Sensor networks are quickly becoming a flexible, inexpensive, and reliable platform to provide solutions for a wide variety of applications in real-world settings. The increase in the proliferation of sensor networks has paralleled the use of more heterogeneous systems in deployment. In this thesis, our work attempts to develop a new network management and data collection framework for heterogeneous wireless sensor networks called as Heterogeneous Wireless Sensor Networks Management System (H-WSNMS), which enables to manage and operate various sensor network systems with unified control and management services and interface. The H-WSNMS framework aims to provide a scheme to manage, query, and interact with sensor network systems. By introducing the concept of Virtual Command Set (VCS), a series of unified application interfaces and Metadata (XML files) across multiple WSNs are designed and implement the scalability and flexibility of the management functions for heterogeneous wireless sensor networks, which is demonstrated though through a series of web-based WSN management Applications such as Monitoring, Configuration, Reprogram, Data Collection and so on. The tests and application trials confirm the feasibility of our approach but also still reveal a number of challenges to be taken into account when deploying wireless sensor and actuator networks at industrial sites, which will be considered by our future research work.Item Extracting Symptoms from Narrative Text using Artificial Intelligence(2020-12) Gandhi, Priyanka; Zou, Xukai; Luo, Xiao; Xia, YuniElectronic health records collect an enormous amount of data about patients. However, the information about the patient’s illness is stored in progress notes that are in an un- structured format. It is difficult for humans to annotate symptoms listed in the free text. Recently, researchers have explored the advancements of deep learning can be applied to pro- cess biomedical data. The information in the text can be extracted with the help of natural language processing. The research presented in this thesis aims at automating the process of symptom extraction. The proposed methods use pre-trained word embeddings such as BioWord2Vec, BERT, and BioBERT to generate vectors of the words based on semantics and syntactic structure of sentences. BioWord2Vec embeddings are fed into a BiLSTM neural network with a CRF layer to capture the dependencies between the co-related terms in the sentence. The pre-trained BERT and BioBERT embeddings are fed into the BERT model with a CRF layer to analyze the output tags of neighboring tokens. The research shows that with the help of the CRF layer in neural network models, longer phrases of symptoms can be extracted from the text. The proposed models are compared with the UMLS Metamap tool that uses various sources to categorize the terms in the text to different semantic types and Stanford CoreNLP, a dependency parser, that analyses syntactic relations in the sentence to extract information. The performance of the models is analyzed by using strict, relaxed, and n-gram evaluation schemes. The results show BioBERT with a CRF layer can extract the majority of the human-labeled symptoms. Furthermore, the model is used to extract symptoms from COVID-19 tweets. The model was able to extract symptoms listed by CDC as well as new symptoms.