- Browse by Author
Browsing by Author "Ge, Jiaqi"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
Item Mining Uncertain Sequential Patterns in Iterative MapReduce(Springer, 2015) Ge, Jiaqi; Xia, Yuni; Wang, Jian; Department of Computer and Information Science, School of ScienceThis paper proposes a sequential pattern mining (SPM) algorithm in large scale uncertain databases. Uncertain sequence databases are widely used to model inaccurate or imprecise timestamped data in many real applications, where traditional SPM algorithms are inapplicable because of data uncertainty and scalability. In this paper, we develop an efficient approach to manage data uncertainty in SPM and design an iterative MapReduce framework to execute the uncertain SPM algorithm in parallel. We conduct extensive experiments in both synthetic and real uncertain datasets. And the experimental results prove that our algorithm is efficient and scalable.Item A Naïve Bayesian Classifier in Categorical Uncertain Data Streams(IEEE, 2014-10) Ge, Jiaqi; Xia, Yuni; Wang, Jian; Department of Computer & Information Science, School of ScienceThis paper proposes a novel naïve Bayesian classifier in categorical uncertain data streams. Uncertainty in categorical data is usually represented by vector valued discrete pdf, which has to be carefully handled to guarantee the underlying performance in data mining applications. In this paper, we map the probabilistic attribute to deterministic points in the Euclidean space and design a distance based and a density based algorithms to measure the correlations between feature vectors and class labels. We also devise a new pre-binning approach to guarantee bounded computation and memory cost in uncertain data streams classification. Experimental results in real uncertain data streams prove that our density-based naive classifier is efficient, accurate, and robust to data uncertainty.Item Towards Efficient Sequential Pattern Mining in Temporal Uncertain Databases(Springer, 2015) Ge, Jiaqi; Xia, Yuni; Wang, Jian; Department of Computer and Information Science, School of ScienceUncertain sequence databases are widely used to model data with inaccurate or imprecise timestamps in many real world applications. In this paper, we use uniform distributions to model uncertain timestamps and adopt possible world semantics to interpret temporal uncertain database. We design an incremental approach to manage temporal uncertainty efficiently, which is integrated into the classic pattern-growth SPM algorithm to mine uncertain sequential patterns. Extensive experiments prove that our algorithm performs well in both efficiency and scalability.