- Browse by Subject
Browsing by Subject "Data Mining"
Now showing 1 - 6 of 6
Results Per Page
Sort Options
Item Efficient IoT Big Data Streaming With Deep-Learning-Enabled Dynamics(IEEE, 2022-11-11) Wong, Junhua; Piuri, Vincenzo; Scotti, Fabio; Zhang, Qingxue; Electrical and Computer Engineering, School of Engineering and TechnologyInternet of Medical Things (IoMT) is igniting many emerging smart health applications, by continuously streaming the big data for data-driven innovations. One critical obstacle in IoMT big data is the power hungriness of long-term data transmission. Targeting this challenge, we propose a novel framework called, IoMT big-data Bayesian-backward deep-encoder learning (IBBD), which mines deep autoencoder (AE) configurations for data sparsification and determines optimal tradeoffs between information loss and power overhead. More specifically, the IBBD framework leverages an additional external Bayesian-backward loop that recommends AE configurations, on top of a traditional deep learning loop that executes and evaluate the AE quality. The IBBD recommendation is based on confidence to further minimize the regularized metrics that quantify the quality of AE configurations, and it further leverages regularization techniques to allow adjusting error–power tradeoffs in the mining process. We have conducted thorough experiments on a cardiac data streaming application and demonstrated the superiority of IBBD over the common practices such as discrete wavelet transform, and we have further generalized IBBD through validating the optimal AE configurations determined on one user to other users. This study is expected to greatly advance IoMT big data streaming practices toward precision medicine.Item Ensemble methods for top-N recommendation(2018-04-20) Fan, Ziwei; Ning, XiaAs the amount of information grows, the desire to efficiently filter out unnecessary information and retain relevant or interested information for people is increasing. To extract the information that will be of interest to people efficiently, we can utilize recommender systems. Recommender systems are information filtering systems that predict the preference of a user to an item. Based on historical data of users, recommender systems are able to make relevant recommendations to users. Due to its usefulness, Recommender systems have been widely used in many applications, including e-commerce and healthcare information systems. However, existing recommender systems suffer from several issues, including data sparsity and user/item heterogeneity. In this thesis, a hybrid dynamic and multi-collaborative filtering based recommendation technique has been developed to recommend search terms for physicians when physicians review a large number of patients’ information. Besides, a local sparse linear method ensemble has been developed to tackle the issues of data sparsity and user/item heterogeneity. In health information technology systems, most physicians suffer from information overload when they review patient information. A novel hybrid dynamic and multi-collaborative filtering method has been developed to improve information retrieval from electronic health records. We tackle the problem of recommending the next search term to a physician while the physician is searching for information about a patient. In this method, I have combined first-order Markov Chain and multi-collaborative filtering methods. For multi-collaborative filtering methods, I have developed the physician-patient collaborative filtering and transition-involved collaborative filtering methods. The developed method is tested using electronic health record data from the Indiana Network for Patient Care. The experimental results demonstrate that for 46.7% of test cases, this new method is able to correctly prioritize relevant information among top-5 recommendations that physicians are truly interested in. The local sparse linear model ensemble has been developed to tackle both the data sparsity and the user/item heterogeneity issues for the top-n recommendation. Multiple local sparse linear models are learned for all the users and items in the system. I have developed similarity-based and popularity-based methods to determine the local training data for each local model. Each local model is trained on Sparse Linear Method (SLIM) which is a powerful recommendation technique for top-n recommendation. These learned models are then combined in various ways to produce top-N recommendations. I have developed model results combination and model combination methods to combine all learned local models. The developed methods are tested on a benchmark dataset and its sparsified datasets. The experiments demonstrate 18.4% improvement from such ensemble models, particularly on sparse datasets.Item Extraction of pharmacokinetic evidence of drug-drug interactions from the literature(PLoS, 2015-05-11) Kolchinsky, Artemy; Lourenço, Anália; Wu, Heng-Yi; Li, Lang; Rocha, Luis M.; Department of Medical and Molecular Genetics, IU School of MedicineDrug-drug interaction (DDI) is a major cause of morbidity and mortality and a subject of intense scientific interest. Biomedical literature mining can aid DDI research by extracting evidence for large numbers of potential interactions from published literature and clinical databases. Though DDI is investigated in domains ranging in scale from intracellular biochemistry to human populations, literature mining has not been used to extract specific types of experimental evidence, which are reported differently for distinct experimental goals. We focus on pharmacokinetic evidence for DDI, essential for identifying causal mechanisms of putative interactions and as input for further pharmacological and pharmacoepidemiology investigations. We used manually curated corpora of PubMed abstracts and annotated sentences to evaluate the efficacy of literature mining on two tasks: first, identifying PubMed abstracts containing pharmacokinetic evidence of DDIs; second, extracting sentences containing such evidence from abstracts. We implemented a text mining pipeline and evaluated it using several linear classifiers and a variety of feature transforms. The most important textual features in the abstract and sentence classification tasks were analyzed. We also investigated the performance benefits of using features derived from PubMed metadata fields, various publicly available named entity recognizers, and pharmacokinetic dictionaries. Several classifiers performed very well in distinguishing relevant and irrelevant abstracts (reaching F1≈0.93, MCC≈0.74, iAUC≈0.99) and sentences (F1≈0.76, MCC≈0.65, iAUC≈0.83). We found that word bigram features were important for achieving optimal classifier performance and that features derived from Medical Subject Headings (MeSH) terms significantly improved abstract classification. We also found that some drug-related named entity recognition tools and dictionaries led to slight but significant improvements, especially in classification of evidence sentences. Based on our thorough analysis of classifiers and feature transforms and the high classification performance achieved, we demonstrate that literature mining can aid DDI discovery by supporting automatic extraction of specific types of experimental evidence.Item Interactive pattern mining of neuroscience data(2014-01-29) Waranashiwar, Shruti Dilip; Mukhopadhyay, Snehasis; Durresi, Arjan; Xia, YuniText mining is a process of extraction of knowledge from unstructured text documents. We have huge volumes of text documents in digital form. It is impossible to manually extract knowledge from these vast texts. Hence, text mining is used to find useful information from text through the identification and exploration of interesting patterns. The objective of this thesis in text mining area is to find compact but high quality frequent patterns from text documents related to neuroscience field. We try to prove that interactive sampling algorithm is efficient in terms of time when compared with exhaustive methods like FP Growth using RapidMiner tool. Instead of mining all frequent patterns, all of which may not be interesting to user, interactive method to mine only desired and interesting patterns is far better approach in terms of utilization of resources. This is especially observed with large number of keywords. In interactive patterns mining, a user gives feedback on whether a pattern is interesting or not. Using Markov Chain Monte Carlo (MCMC) sampling method, frequent patterns are generated in an interactive way. Thesis discusses extraction of patterns between the keywords related to some of the common disorders in neuroscience in an interactive way. PubMed database and keywords related to schizophrenia and alcoholism are used as inputs. This thesis reveals many associations between the different terms, which are otherwise difficult to understand by reading articles or journals manually. Graphviz tool is used to visualize associations.Item Real-time road traffic events detection and geo-parsing(2018-08-08) Kumar, Saurabh; Koskie, SarahIn the 21st century, there is an increasing number of vehicles on the road as well as a limited road infrastructure. These aspects culminate in daily challenges for the average commuter due to congestion and slow moving traffic. In the United States alone, it costs an average US driver $1200 every year in the form of fuel and time. Some positive steps, including (a) introduction of the push notification system and (b) deploying more law enforcement troops, have been taken for better traffic management. However, these methods have limitations and require extensive planning. Another method to deal with traffic problems is to track the congested area in a city using social media. Next, law enforcement resources can be re-routed to these areas on a real-time basis. Given the ever-increasing number of smartphone devices, social media can be used as a source of information to track the traffic-related incidents. Social media sites allow users to share their opinions and information. Platforms like Twitter, Facebook, and Instagram are very popular among users. These platforms enable users to share whatever they want in the form of text and images. Facebook users generate millions of posts in a minute. On these platforms, abundant data, including news, trends, events, opinions, product reviews, etc. are generated on a daily basis. Worldwide, organizations are using social media for marketing purposes. This data can also be used to analyze the traffic-related events like congestion, construction work, slow-moving traffic etc. Thus the motivation behind this research is to use social media posts to extract information relevant to traffic, with effective and proactive traffic administration as the primary focus. I propose an intuitive two-step process to utilize Twitter users' posts to obtain for retrieving traffic-related information on a real-time basis. It uses a text classifier to filter out the data that contains only traffic information. This is followed by a Part-Of-Speech (POS) tagger to find the geolocation information. A prototype of the proposed system is implemented using distributed microservices architecture.Item Text mining for drug-drug interaction(Springer-Verlag, 2014) Wu, Heng-Yi; Chiang, Chien-Wei; Li, Lang; Department of Medicine, IU School of MedicineIn order to understand the mechanisms of drug-drug interaction (DDI), the study of pharmacokinetics (PK), pharmacodynamics (PD), and pharmacogenetics (PG) data are significant. In recent years, drug PK parameters, drug interaction parameters, and PG data have been unevenly collected in different databases and published extensively in literature. Also the lack of an appropriate PK ontology and a well-annotated PK corpus, which provide the background knowledge and the criteria of determining DDI, respectively, lead to the difficulty of developing DDI text mining tools for PK data collection from the literature and data integration from multiple databases.To conquer the issues, we constructed a comprehensive pharmacokinetics ontology. It includes all aspects of in vitro pharmacokinetics experiments, in vivo pharmacokinetics studies, as well as drug metabolism and transportation enzymes. Using our pharmacokinetics ontology, a PK corpus was constructed to present four classes of pharmacokinetics abstracts: in vivo pharmacokinetics studies, in vivo pharmacogenetic studies, in vivo drug interaction studies, and in vitro drug interaction studies. A novel hierarchical three-level annotation scheme was proposed and implemented to tag key terms, drug interaction sentences, and drug interaction pairs. The utility of the pharmacokinetics ontology was demonstrated by annotating three pharmacokinetics studies; and the utility of the PK corpus was demonstrated by a drug interaction extraction text mining analysis.The pharmacokinetics ontology annotates both in vitro pharmacokinetics experiments and in vivo pharmacokinetics studies. The PK corpus is a highly valuable resource for the text mining of pharmacokinetics parameters and drug interactions.