- Browse by Subject
Browsing by Subject "Anomaly detection"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
Item Batch Discovery of Recurring Rare Classes toward Identifying Anomalous Samples(ACM, 2014) Dundar, Murat; Yerebakan, Halid Ziya; Rajwa, Bartek; Computer and Information Science, School of ScienceWe present a clustering algorithm for discovering rare yet significant recurring classes across a batch of samples in the presence of random effects. We model each sample data by an infinite mixture of Dirichlet-process Gaussian-mixture models (DPMs) with each DPM representing the noisy realization of its corresponding class distribution in a given sample. We introduce dependencies across multiple samples by placing a global Dirichlet process prior over individual DPMs. This hierarchical prior introduces a sharing mechanism across samples and allows for identifying local realizations of classes across samples. We use collapsed Gibbs sampler for inference to recover local DPMs and identify their class associations. We demonstrate the utility of the proposed algorithm, processing a flow cytometry data set containing two extremely rare cell populations, and report results that significantly outperform competing techniques. The source code of the proposed algorithm is available on the web via the link: http://cs.iupui.edu/~dundar/aspire.htm.Item Context-Aware Collaborative Intelligence With Spatio-Temporal In-Sensor-Analytics for Efficient Communication in a Large-Area IoT Testbed(IEEE, 2021) Chatterjee, Baibhab; Seo, Dong-Hyun; Chakraborty, Shramana; Avlani, Shitij; Jiang, Xiaofan; Zhang, Heng; Abdallah, Mustafa; Raghunathan, Nithin; Mousoulis, Charilaos; Shakouri, Ali; Bagchi, Saurabh; Peroulis, Dimitrios; Sen, Shreyas; Electrical and Computer Engineering, Purdue School of Engineering and TechnologyDecades of continuous scaling has reduced the energy of unit computing to virtually zero, while energy-efficient communication has remained the primary bottleneck in achieving fully energy-autonomous Internet-of-Things (IoT) nodes. This article presents and analyzes the tradeoffs between the energies required for communication and computation in a wireless sensor network, deployed in a mesh architecture over a 2400-acre university campus, and is targeted toward multisensor measurement of temperature, humidity and water nitrate concentration for smart agriculture. Several scenarios involving in-sensor analytics (ISA), collaborative intelligence (CI), and context-aware switching (CAS) of the cluster head during CI has been considered. A real-time co-optimization algorithm has been developed for minimizing the energy consumption in the network, hence maximizing the overall battery lifetime. Measurement results show that the proposed ISA consumes ≈ 467× lower energy as compared to traditional Bluetooth low energy (BLE) communication, and ≈ 69500× lower energy as compared with long-range (LoRa) communication. When the ISA is implemented in conjunction with LoRa, the lifetime of the node increases from a mere 4.3 h to 66.6 days with a 230-mAh coin cell battery, while preserving >99% of the total information. The CI and CAS algorithms help in extending the worst case node lifetime by an additional 50%, thereby exhibiting an overall network lifetime of ≈ 104 days, which is >90% of the theoretical limits as posed by the leakage current present in the system, while effectively transferring information sampled every second. A Web-based monitoring system was developed to continuously archive the measured data, and for reporting real-time anomalies.Item MedShift: Automated Identification of Shift Data for Medical Image Dataset Curation(IEEE, 2023) Guo, Xiaoyuan; Wawira Gichoya, Judy; Trivedi, Hari; Purkayastha, Saptarshi; Banerjee, Imon; Biomedical Engineering and Informatics, Luddy School of Informatics, Computing, and EngineeringAutomated curation of noisy external data in the medical domain has long been demanding as AI technologies should be validated on various sources with clean annotated data. To curate a high-quality dataset, identifying variance between the internal and external sources is a fundamental step as the data distributions from different sources can vary significantly and subsequently affect the performance of the AI models. Primary challenges for detecting data shifts are – (1) access to private data across healthcare institutions for manual detection, and (2) the lack of automated approaches to learn efficient shift-data representation without training samples. To overcome the problems, we propose an automated pipeline called MedShift to detect the top-level shift samples and evaluating the significance of shift data without sharing data between the internal and external organizations. MedShift employs unsupervised anomaly detectors to learn the internal distribution and identify samples showing significant shiftness for external datasets, and compared their performance. To quantify the effects of detected shift data, we train a multi-class classifier that learns internal domain knowledge and evaluating the classification performance for each class in external domains after dropping the shift data. We also propose a data quality metric to quantify the dissimilarity between the internal and external datasets. We verify the efficacy of MedShift with musculoskeletal radiographs (MURA) and chest X-rays datasets from more than one external source. Experiments show our proposed shift data detection pipeline can be beneficial for medical centers to curate high-quality datasets more efficiently. The code can be found at https://github.com/XiaoyuanGuo/MedShift. An interface introduction video to visualize our results is available at https://youtu.be/V3BF0P1sxQE.Item MedShift: identifying shift data for medical dataset curation(2021) Guo, Xiaoyuan; Gichoya, Judy Wawira; Trivedi, Hari; Purkayastha, Saptarshi; Banerjee, Imon; BioHealth Informatics, School of Informatics and ComputingTo curate a high-quality dataset, identifying data variance between the internal and external sources is a fundamental and crucial step. However, methods to detect shift or variance in data have not been significantly researched. Challenges to this are the lack of effective approaches to learn dense representation of a dataset and difficulties of sharing private data across medical institutions. To overcome the problems, we propose a unified pipeline called MedShift to detect the top-level shift samples and thus facilitate the medical curation. Given an internal dataset A as the base source, we first train anomaly detectors for each class of dataset A to learn internal distributions in an unsupervised way. Second, without exchanging data across sources, we run the trained anomaly detectors on an external dataset B for each class. The data samples with high anomaly scores are identified as shift data. To quantify the shiftness of the external dataset, we cluster B's data into groups class-wise based on the obtained scores. We then train a multi-class classifier on A and measure the shiftness with the classifier's performance variance on B by gradually dropping the group with the largest anomaly score for each class. Additionally, we adapt a dataset quality metric to help inspect the distribution differences for multiple medical sources. We verify the efficacy of MedShift with musculoskeletal radiographs (MURA) and chest X-rays datasets from more than one external source. Experiments show our proposed shift data detection pipeline can be beneficial for medical centers to curate high-quality datasets more efficiently. An interface introduction video to visualize our results is available at https://youtu.be/V3BF0P1sxQE.Item On Evaluating Black-Box Explainable AI Methods for Enhancing Anomaly Detection in Autonomous Driving Systems(MDPI, 2024-05-29) Nazat, Sazid; Arreche, Osvaldo; Abdallah, Mustafa; Electrical and Computer Engineering, Purdue School of Engineering and TechnologyThe recent advancements in autonomous driving come with the associated cybersecurity issue of compromising networks of autonomous vehicles (AVs), motivating the use of AI models for detecting anomalies on these networks. In this context, the usage of explainable AI (XAI) for explaining the behavior of these anomaly detection AI models is crucial. This work introduces a comprehensive framework to assess black-box XAI techniques for anomaly detection within AVs, facilitating the examination of both global and local XAI methods to elucidate the decisions made by XAI techniques that explain the behavior of AI models classifying anomalous AV behavior. By considering six evaluation metrics (descriptive accuracy, sparsity, stability, efficiency, robustness, and completeness), the framework evaluates two well-known black-box XAI techniques, SHAP and LIME, involving applying XAI techniques to identify primary features crucial for anomaly classification, followed by extensive experiments assessing SHAP and LIME across the six metrics using two prevalent autonomous driving datasets, VeReMi and Sensor. This study advances the deployment of black-box XAI methods for real-world anomaly detection in autonomous driving systems, contributing valuable insights into the strengths and limitations of current black-box XAI methods within this critical domain.