- Browse by Author
Browsing by Author "Department of Computer and Information Science, School of Science"
Now showing 1 - 10 of 13
Results Per Page
Sort Options
Item Bayesian Non-Exhaustive Classification A Case Study: Online Name Disambiguation using Temporal Record Streams(ACM, 2016-10) Zhang, Baichuan; Dundar, Murat; Al Hasan, Mohammad; Department of Computer and Information Science, School of ScienceThe name entity disambiguation task aims to partition the records of multiple real-life persons so that each partition contains records pertaining to a unique person. Most of the existing solutions for this task operate in a batch mode, where all records to be disambiguated are initially available to the algorithm. However, more realistic settings require that the name disambiguation task be performed in an online fashion, in addition to, being able to identify records of new ambiguous entities having no preexisting records. In this work, we propose a Bayesian non-exhaustive classification framework for solving online name disambiguation task. Our proposed method uses a Dirichlet process prior with a Normal x Normal x Inverse Wishart data model which enables identification of new ambiguous entities who have no records in the training data. For online classification, we use one sweep Gibbs sampler which is very efficient and effective. As a case study we consider bibliographic data in a temporal stream format and disambiguate authors by partitioning their papers into homogeneous groups. Our experimental results demonstrate that the proposed method is better than existing methods for performing online name disambiguation task.Item DCMS: A data analytics and management system for molecular simulation(SpringerOpen, 2014-11-26) Kumar, Anand; Grupcev, Vladimir; Berrada, Meryem; Fogarty, Joseph C.; Tu, Yi-Cheng; Zhu, Xingquan; Pandit, Sagar A.; Xia, Yuni; Department of Computer and Information Science, School of ScienceMolecular Simulation (MS) is a powerful tool for studying physical/chemical features of large systems and has seen applications in many scientific and engineering domains. During the simulation process, the experiments generate a very large number of atoms and intend to observe their spatial and temporal relationships for scientific analysis. The sheer data volumes and their intensive interactions impose significant challenges for data accessing, managing, and analysis. To date, existing MS software systems fall short on storage and handling of MS data, mainly because of the missing of a platform to support applications that involve intensive data access and analytical process. In this paper, we present the database-centric molecular simulation (DCMS) system our team developed in the past few years. The main idea behind DCMS is to store MS data in a relational database management system (DBMS) to take advantage of the declarative query interface (i.e., SQL), data access methods, query processing, and optimization mechanisms of modern DBMSs. A unique challenge is to handle the analytical queries that are often compute-intensive. For that, we developed novel indexing and query processing strategies (including algorithms running on modern co-processors) as integrated components of the DBMS. As a result, researchers can upload and analyze their data using efficient functions implemented inside the DBMS. Index structures are generated to store analysis results that may be interesting to other users, so that the results are readily available without duplicating the analysis. We have developed a prototype of DCMS based on the PostgreSQL system and experiments using real MS data and workload show that DCMS significantly outperforms existing MS software systems. We also used it as a platform to test other data management issues such as security and compression.Item Dependability and Security in Medical Information System(Springer Nature, 2007) Zou, Xukai; Dai, Yuan-Shun; Doebbeling, Bradley; Qi, Mingrui; Department of Computer and Information Science, School of ScienceMedical Information Systems (MIS) help medical practice and health care significantly. Security and dependability are two increasingly important factors for MIS nowadays. In one hand, people would be willing to step into the MIS age only when their privacy and integrity can be protected and guaranteed with MIS systems. On the other hand, only secure and reliable MIS systems would provide safe and solid medical and health care service to people. In this paper, we discuss some new security and reliability technologies which are necessary for and can be integrated with existing MISs and make the systems highly secure and dependable. We also present an implemented Middleware architecture which has been integrated with the existing VISTA/CPRS system in the U.S. Department of Veterans Affairs seamlessly and transparently.Item Enabling Self-healing Smart Grid Through Jamming Resilient Local Controller Switching(IEEE, 2015-09) Liu, Hongbo; Chen, Yingying; Chuah, Mooi Choo; Yang, Jie; Poor, H. Vincent; Department of Computer and Information Science, School of ScienceA key component of a smart grid is its ability to collect useful information from a power grid for enabling control centers to estimate the current states of the power grid. Such information can be delivered to the control centers via wireless or wired networks. It is envisioned that wireless technology will be widely used for local-area communication subsystems in the smart grid (e.g., in distribution networks). However, various attacks with serious impact can be launched in wireless networks such as channel jamming attacks and denial-of-service attacks. In particular, jamming attacks can cause significant damages to power grids, e.g., delayed delivery of time-critical messages can prevent control centers from properly controlling the outputs of generators to match load demands. In this paper, a communication subsystem with enhanced self-healing capability in the presence of jamming is designed via intelligent local controller switching while integrating a retransmission mechanism. The proposed framework allows sufficient readings from smart meters to be continuously collected by various local controllers to estimate the states of a power grid under various attack scenarios. The jamming probability is also analyzed considering the impact of jammer power and shadowing effects. In addition, guidelines on optimal placement of local controllers to ensure effective switching of smart meters under jamming are provided. Via theoretical, experimental and simulation studies, it is demonstrated that our proposed system is effective in maintaining communications between smart meters and local controllers even when multiple jammers are present in the network.Item An Experimental Distributed Framework for Distributed Simultaneous Localization and Mapping(IEEE, 2016-05) Gamage, Ruwan; Tuceryan, Mihran; Department of Computer and Information Science, School of ScienceSimultaneous Localization and Mapping (SLAM) is widely used in applications such as rescue, navigation, semantic mapping, augmented reality and home entertainment applications. Most of these applications would do better if multiple devices are used in a distributed setting. The distributed SLAM research would benefit if there is a framework where the complexities of network communication is already handled. In this paper we introduce such framework utilizing open source Robot Operating System (ROS) and VirtualBox virtualization software. Furthermore, we describe a way to measure communication statistics of the distributed SLAM system.Item Kernelized Sparse Self-Representation for Clustering and Recommendation(SIAM, 2016) Bian, Xiao; Li, Feng; Ning, Xia; Department of Computer and Information Science, School of ScienceSparse models have demonstrated substantial success in applications for data analysis such as clustering, classification and denoising. However, most of the current work is built upon the assumption that data is distributed in a union of subspaces, whereas limited work has been conducted on nonlinear datasets where data reside in a union of manifolds rather than a union of subspaces. To understand data nonlinearity using sparse models, in this paper, we propose to exploit the self-representation property of nonlinear data in an implicit feature space using kernel methods. We propose a kernelized sparse self-representation model, denoted as KSSR, and a novel Kernelized Fast Iterative Soft-Thresholding Algorithm, denoted as K-FISTA, to recover the underlying nonlinear structure among the data. We evaluate our method for clustering problems on both synthetic and real-world datasets, and demonstrate its superior performance compared to the other state-of-the-art methods. We also apply our method for collaborative filtering in recommender systems, and demonstrate its great potential for novel applications beyond clustering.Item Machine Learning Techniques for Prediction of Early Childhood Obesity(Schattauer, 2015-08-12) Dugan, T.M.; Mukhopadhyay, S.; Carroll, A.; Downs, S.; Department of Computer and Information Science, School of ScienceObjectives This paper aims to predict childhood obesity after age two, using only data collected prior to the second birthday by a clinical decision support system called CHICA. Methods Analyses of six different machine learning methods: RandomTree, RandomForest, J48, ID3, Naïve Bayes, and Bayes trained on CHICA data show that an accurate, sensitive model can be created. Results Of the methods analyzed, the ID3 model trained on the CHICA dataset proved the best overall performance with accuracy of 85% and sensitivity of 89%. Additionally, the ID3 model had a positive predictive value of 84% and a negative predictive value of 88%. The structure of the tree also gives insight into the strongest predictors of future obesity in children. Many of the strongest predictors seen in the ID3 modeling of the CHICA dataset have been independently validated in the literature as correlated with obesity, thereby supporting the validity of the model. Conclusions This study demonstrated that data from a production clinical decision support system can be used to build an accurate machine learning model to predict obesity in children after age two.Item Mining Uncertain Sequential Patterns in Iterative MapReduce(Springer, 2015) Ge, Jiaqi; Xia, Yuni; Wang, Jian; Department of Computer and Information Science, School of ScienceThis paper proposes a sequential pattern mining (SPM) algorithm in large scale uncertain databases. Uncertain sequence databases are widely used to model inaccurate or imprecise timestamped data in many real applications, where traditional SPM algorithms are inapplicable because of data uncertainty and scalability. In this paper, we develop an efficient approach to manage data uncertainty in SPM and design an iterative MapReduce framework to execute the uncertain SPM algorithm in parallel. We conduct extensive experiments in both synthetic and real uncertain datasets. And the experimental results prove that our algorithm is efficient and scalable.Item Multi-Task Multi-Dimensional Hawkes Processes for Modeling Event Sequences(ACM, 2015-07) Luo, Dixin; Xu, Hongteng; Zhen, Yi; Ning, Xia; Zha, Hongyuan; Yang, Xiaokang; Zhang, Wenjun; Department of Computer and Information Science, School of ScienceWe propose a Multi-task Multi-dimensional Hawkes Process (MMHP) for modeling event sequences where there exist multiple triggering patterns within sequences and structures across sequences. MMHP is able to model the dynamics of multiple sequences jointly by imposing structural constraints and thus systematically uncover clustering structure among sequences. We propose an effective and robust optimization algorithm to learn MMHP models, which takes advantage of alternating direction method of multipliers (ADMM), majorization minimization and Euler-Lagrange equations. Our experimental results demonstrate that MMHP performs well on both synthetic and real data.Item PInfer: Learning to Infer Concurrent Request Paths from System Kernel Events(IEEE, 2016-07) Xu, Hongteng; Ning, Xia; Zhang, Hui; Rhee, Junghwan; Jiang, Guofei; Department of Computer and Information Science, School of ScienceOperating system kernel-level tracers are popularly used in the post-development stage by black-box approaches. By inferring service request processing paths from kernel events, these approaches enabled system diagnosis and performance management that are application-logic aware. However, asynchronous communications and multi-threading behaviors make request path patterns dynamic on the kernel event level, this causes previous methods to focus on either software instrumentation techniques or better statistical inference models. In this paper, we propose a novel learning based approach called PInfer that infers request processing path patterns automatically with high precision. PInfer first learns dynamic event patterns of inter-thread and intra-thread service processing from the training data of sequential requests. On the testing data containing concurrent requests, PInfer infers individual request processing paths by effectively solving a graph matching problem and a generalized assignment problem based on the learned patterns. We have implemented our approach in a proprietary system performance diagnosis tool, and present performance results on 40 sets of kernel event traces. PInfer achieves on average 65% precision and 85% recall for profiling concurrent request processing paths.