- Browse by Date Submitted
Department of Computer and Information Science Works
Permanent URI for this collection
Browse
Browsing Department of Computer and Information Science Works by browse.metadata.dateaccessioned
Now showing 1 - 10 of 281
Results Per Page
Sort Options
Item Reliable Video over Software-Defined Networking (RVSDN)(IEEE, 2014-12) Owens, Harold II; Durresi, Arjan; Jain, Raj; Department of Computer & Information Science, IU School of ScienceEnsuring end-to-end quality of service for video applications requires the network to choose the most feasible path in terms of bandwidth, delay and jitter. Quality of service can only be ensured if the paths are reliable - perform to specification per request. This paper makes four contributions to research. First, it presents Reliable Video over Software-Defined Networking (RVSDN) which builds upon previous work of Video over Software-Defined Networking (VSDN) to address the issue of finding the most reliable path(s) through the network for video applications. Second, it presents the design and implementation of RVSDN. Third, it presents the experience of integrating RVSDN into ns-3 which is a network simulator used by the research community to simulate and model computer networks. Finally, it presents the results of RVSDN in terms of the number of requests serviced by the network architecture. RVSDN is able to service 31 times more requests than VSDN and MPLS explicit routing when the reliability constraint is 0.995 or greater using aggregation of reliability across network paths.Item Electronic health information quality challenges and interventions to improve public health surveillance data and practice(Association of Schools of Public Health, 2013) Dixon, Brian E.; Siegel, Jason A.; Oemig, Tanya V.; Grannis, Shaun J.; Computer & Information Science, School of ScienceOBJECTIVE: We examined completeness, an attribute of data quality, in the context of electronic laboratory reporting (ELR) of notifiable disease information to public health agencies. METHODS: We extracted more than seven million ELR messages from multiple clinical information systems in two states. We calculated and compared the completeness of various data fields within the messages that were identified to be important to public health reporting processes. We compared unaltered, original messages from source systems with similar messages from another state as well as messages enriched by a health information exchange (HIE). Our analysis focused on calculating completeness (i.e., the number of nonmissing values) for fields deemed important for inclusion in notifiable disease case reports. RESULTS: The completeness of data fields for laboratory transactions varied across clinical information systems and jurisdictions. Fields identifying the patient and test results were usually complete (97%-100%). Fields containing patient demographics, patient contact information, and provider contact information were suboptimal (6%-89%). Transactions enhanced by the HIE were found to be more complete (increases ranged from 2% to 25%) than the original messages. CONCLUSION: ELR data from clinical information systems can be of suboptimal quality. Public health monitoring of data sources and augmentation of ELR message content using HIE services can improve data quality.Item Sampling Triples from Restricted Networks Using MCMC Strategy(ACM, 2014) Rahman, Mahmudur; Hasan, Mohammad Al; Department of Computer Science, IUPUIIn large networks, the connected triples are useful for solving various tasks including link prediction, community detection, and spam filtering. Existing works in this direction concern mostly with the exact or approximate counting of connected triples that are closed (aka, triangles). Evidently, the task of triple sampling has not been explored in depth, although sampling is a more fundamental task than counting, and the former is useful for solving various other tasks, including counting. In recent years, some works on triple sampling have been proposed that are based on direct sampling, solely for the purpose of triangle count approximation. They sample only from a uniform distribution, and are not effective for sampling triples from an arbitrary user-defined distribution. In this work we present two indirect triple sampling methods that are based on Markov Chain Monte Carlo (MCMC) sampling strategy. Both of the above methods are highly efficient compared to a direct sampling-based method, specifically for the task of sampling from a non-uniform probability distribution. Another significant advantage of the proposed methods is that they can sample triples from networks that have restricted access, on which a direct sampling based method is simply not applicable.Item FS3: A Sampling based method for top-k Frequent Subgraph Mining(2015) Saha, Tanay Kumar; Al Hasan, Mohammad; Department of Computer & Information Science, School of ScienceMining labeled subgraph is a popular research task in data mining because of its potential application in many different scientific domains. All the existing methods for this task explicitly or implicitly solve the subgraph isomorphism task which is computationally expensive, so they suffer from the lack of scalability problem when the graphs in the input database are large. In this work, we propose FS3, which is a sampling based method. It mines a small collection of subgraphs that are most frequent in the probabilistic sense. FS3 performs a Markov Chain Monte Carlo (MCMC) sampling over the space of a fixed-size subgraphs such that the potentially frequent subgraphs are sampled more often. Besides, FS3 is equipped with an innovative queue manager. It stores the sampled subgraph in a finite queue over the course of mining in such a manner that the top-k positions in the queue contain the most frequent subgraphs. Our experiments on database of large graphs show that FS3 is efficient, and it obtains subgraphs that are the most frequent amongst the subgraphs of a given size.Item A Naïve Bayesian Classifier in Categorical Uncertain Data Streams(IEEE, 2014-10) Ge, Jiaqi; Xia, Yuni; Wang, Jian; Department of Computer & Information Science, School of ScienceThis paper proposes a novel naïve Bayesian classifier in categorical uncertain data streams. Uncertainty in categorical data is usually represented by vector valued discrete pdf, which has to be carefully handled to guarantee the underlying performance in data mining applications. In this paper, we map the probabilistic attribute to deterministic points in the Euclidean space and design a distance based and a density based algorithms to measure the correlations between feature vectors and class labels. We also devise a new pre-binning approach to guarantee bounded computation and memory cost in uncertain data streams classification. Experimental results in real uncertain data streams prove that our density-based naive classifier is efficient, accurate, and robust to data uncertainty.Item LOCALIZED TEMPORAL PROFILE OF SURVEILLANCE VIDEO(IEEE, 2014-07) Bagheri, Saeid; Zheng, Jiang Yu; Department of Computer & Information Science, School of ScienceSurveillance videos are recorded pervasively and their retrieval currently still relies on human operators. As an intermediate representation, this work develops a new temporal profile of video to convey accurate temporal information in the video while keeping certain spatial characteristics of targets of interest for recognition. The profile is obtained at critical positions where major target flow appears. We set a sampling line crossing the motion direction to profile passing targets in the temporal domain. In order to add spatial information to the temporal profile to certain extent, we integrate multiple profiles from a set of lines with blending method to reflect the target motion direction and position in the temporal profile. Different from mosaicing/montage methods for video synopsis in spatial domain, our temporal profile has no limit on the time length, and the created profile significantly reduces the data size for brief indexing and fast search of video.Item The Infinite Mixture of Infinite Gaussian Mixtures(2015) Yerebakan, Halid Z.; Rajwa, Bartek; Dundar, Murat; Department of Computer & Information Science, School of ScienceDirichlet process mixture of Gaussians (DPMG) has been used in the literature for clustering and density estimation problems. However, many real-world data exhibit cluster distributions that cannot be captured by a single Gaussian. Modeling such data sets by DPMG creates several extraneous clusters even when clusters are relatively well-defined. Herein, we present the infinite mixture of infinite Gaussian mixtures (I2GMM) for more flexible modeling of data sets with skewed and multi-modal cluster distributions. Instead of using a single Gaussian for each cluster as in the standard DPMG model, the generative model of I2GMM uses a single DPMG for each cluster. The individual DPMGs are linked together through centering of their base distributions at the atoms of a higher level DP prior. Inference is performed by a collapsed Gibbs sampler that also enables partial parallelization. Experimental results on several artificial and real-world data sets suggest the proposed I2GMM model can predict clusters more accurately than existing variational Bayes and Gibbs sampler versions of DPMG.Item A web-based software tool for participatory optimization of conservation practices in watersheds(Elsevier, 2015-07) Babbar-Sebens, Meghna; Mukhopadhyay, Snehasis; Singh, Vidya Bhushan; Piemonti, Adriana Debora; Department of Computer & Information Science, School of ScienceWRESTORE (Watershed Restoration Using Spatio-Temporal Optimization of Resources) is a web-based, participatory planning tool that can be used to engage with watershed stakeholder communities, and involve them in using science-based, human-guided, interactive simulation–optimization methods for designing potential conservation practices on their landscape. The underlying optimization algorithms, process simulation models, and interfaces allow users to not only spatially optimize the locations and types of new conservation practices based on quantifiable goals estimated by the dynamic simulation models, but also to include their personal subjective and/or unquantifiable criteria in the location and design of these practices. In this paper, we describe the software, interfaces, and architecture of WRESTORE, provide scenarios for implementing the WRESTORE tool in a watershed community's planning process, and discuss considerations for future developments.Item Pin++: An Object-oriented Framework for Writing Pintools(ACM, 2015) Hill, James H.; Feiock, Dennis C.; Department of Computer & Information Science, School of ScienceThis paper presents a framework named Pin++. Pin++ is an object-oriented framework that uses template metaprogramming to implement Pintools, which are analysis tools for the dynamic binary instrumentation tool named Pin. The goal of Pin++ is to simplify programming a Pintool and promote reuse of its components across different Pintools. Our results show that Pintools implemented using Pin++ can have a 54% reduction in complexity, increase its modularity, and up to 60% reduction in instrumentation overhead.Item PAGER: constructing PAGs and new PAG-PAG relationships for network biology(Oxford University Press, 2015-06-15) Yue, Zongliang; Kshirsagar, Madhura M.; Nguyen, Thanh; Suphavilai, Chayaporn; Neylon, Michael T.; Zhu, Liugen; Ratliff, Timothy; Chen, Jake Yue; Department of Computer & Information Science, School of ScienceIn this article, we described a new database framework to perform integrative "gene-set, network, and pathway analysis" (GNPA). In this framework, we integrated heterogeneous data on pathways, annotated list, and gene-sets (PAGs) into a PAG electronic repository (PAGER). PAGs in the PAGER database are organized into P-type, A-type and G-type PAGs with a three-letter-code standard naming convention. The PAGER database currently compiles 44 313 genes from 5 species including human, 38 663 PAGs, 324 830 gene-gene relationships and two types of 3 174 323 PAG-PAG regulatory relationships-co-membership based and regulatory relationship based. To help users assess each PAG's biological relevance, we developed a cohesion measure called Cohesion Coefficient (CoCo), which is capable of disambiguating between biologically significant PAGs and random PAGs with an area-under-curve performance of 0.98. PAGER database was set up to help users to search and retrieve PAGs from its online web interface. PAGER enable advanced users to build PAG-PAG regulatory networks that provide complementary biological insights not found in gene set analysis or individual gene network analysis. We provide a case study using cancer functional genomics data sets to demonstrate how integrative GNPA help improve network biology data coverage and therefore biological interpretability. The PAGER database can be accessible openly at http://discovery.informatics.iupui.edu/PAGER/.