- Browse by Author
Browsing by Author "Guo, Xiaoyuan"
Now showing 1 - 7 of 7
Results Per Page
Sort Options
Item CVAD - An unsupervised image anomaly detector(Elsevier, 2022-02) Guo, Xiaoyuan; Gichoya, Judy Wawira; Purkayastha, Saptarshi; Banerjee, Imon; BioHealth Informatics, School of Informatics and ComputingDetecting out-of-distribution samples for image applications plays an important role in safeguarding the reliability of machine learning model deployment. In this article, we developed a software tool to support our OOD detector CVAD - a self-supervised Cascade Variational autoencoder-based Anomaly Detector , which can be easily applied to various image applications without any assumptions. The corresponding open-source software is published for better public research and tool usage.Item CVAD: A generic medical anomaly detector based on Cascade VAE(arXiv, 2021) Guo, Xiaoyuan; Gichoya, Judy Wawira; Purkayastha, Saptarshi; Banerjee, Imon; BioHealth Informatics, School of Informatics and ComputingDetecting out-of-distribution (OOD) samples in medical imaging plays an important role for downstream medical diagnosis. However, existing OOD detectors are demonstrated on natural images composed of inter-classes and have difficulty generalizing to medical images. The key issue is the granularity of OOD data in the medical domain, where intra-class OOD samples are predominant. We focus on the generalizability of OOD detection for medical images and propose a self-supervised Cascade Variational autoencoder-based Anomaly Detector (CVAD). We use a variational autoencoders' cascade architecture, which combines latent representation at multiple scales, before being fed to a discriminator to distinguish the OOD data from the in-distribution (ID) data. Finally, both the reconstruction error and the OOD probability predicted by the binary discriminator are used to determine the anomalies. We compare the performance with the state-of-the-art deep learning models to demonstrate our model's efficacy on various open-access medical imaging datasets for both intra- and inter-class OOD. Further extensive results on datasets including common natural datasets show our model's effectiveness and generalizability.Item Margin-Aware Intra-Class Novelty Identification for Medical Images(SPIE, 2022-02) Guo, Xiaoyuan; Gichoya, Judy Wawira; Purkayastha, Saptarshi; Banerjee, Imon; BioHealth Informatics, School of Informatics and ComputingPurpose: Existing anomaly detection methods focus on detecting interclass variations while medical image novelty identification is more challenging in the presence of intraclass variations. For example, a model trained with normal chest x-ray and common lung abnormalities is expected to discover and flag idiopathic pulmonary fibrosis, which is a rare lung disease and unseen during training. The nuances of intraclass variations and lack of relevant training data in medical image analysis pose great challenges for existing anomaly detection methods. Approach: We address the above challenges by proposing a hybrid model—transformation-based embedding learning for novelty detection (TEND), which combines the merits of classifier-based approach and AutoEncoder (AE)-based approach. Training TEND consists of two stages. In the first stage, we learn in-distribution embeddings with an AE via the unsupervised reconstruction. In the second stage, we learn a discriminative classifier to distinguish in-distribution data and the transformed counterparts. Additionally, we propose a margin-aware objective to pull in-distribution data in a hypersphere while pushing away the transformed data. Eventually, the weighted sum of class probability and the distance to margin constitutes the anomaly score. Results: Extensive experiments are performed on three public medical image datasets with the one-vs-rest setup (namely one class as in-distribution data and the left as intraclass out-of-distribution data) and the rest-vs-one setup. Additional experiments on generated intraclass out-of-distribution data with unused transformations are implemented on the datasets. The quantitative results show competitive performance as compared to the state-of-the-art approaches. Provided qualitative examples further demonstrate the effectiveness of TEND. Conclusion: Our anomaly detection model TEND can effectively identify the challenging intraclass out-of-distribution medical images in an unsupervised fashion. It can be applied to discover unseen medical image classes and serve as the abnormal data screening for downstream medical tasks. The corresponding code is available at https://github.com/XiaoyuanGuo/TEND_MedicalNoveltyDetection.Item MedShift: identifying shift data for medical dataset curation(2021) Guo, Xiaoyuan; Gichoya, Judy Wawira; Trivedi, Hari; Purkayastha, Saptarshi; Banerjee, Imon; BioHealth Informatics, School of Informatics and ComputingTo curate a high-quality dataset, identifying data variance between the internal and external sources is a fundamental and crucial step. However, methods to detect shift or variance in data have not been significantly researched. Challenges to this are the lack of effective approaches to learn dense representation of a dataset and difficulties of sharing private data across medical institutions. To overcome the problems, we propose a unified pipeline called MedShift to detect the top-level shift samples and thus facilitate the medical curation. Given an internal dataset A as the base source, we first train anomaly detectors for each class of dataset A to learn internal distributions in an unsupervised way. Second, without exchanging data across sources, we run the trained anomaly detectors on an external dataset B for each class. The data samples with high anomaly scores are identified as shift data. To quantify the shiftness of the external dataset, we cluster B's data into groups class-wise based on the obtained scores. We then train a multi-class classifier on A and measure the shiftness with the classifier's performance variance on B by gradually dropping the group with the largest anomaly score for each class. Additionally, we adapt a dataset quality metric to help inspect the distribution differences for multiple medical sources. We verify the efficacy of MedShift with musculoskeletal radiographs (MURA) and chest X-rays datasets from more than one external source. Experiments show our proposed shift data detection pipeline can be beneficial for medical centers to curate high-quality datasets more efficiently. An interface introduction video to visualize our results is available at https://youtu.be/V3BF0P1sxQE.Item MedShift: identifying shift data for medical dataset curation(2021-12-27) Guo, Xiaoyuan; Wawira Gichoya, Judy; Trivedi, Hari; Purkayastha, Saptarshi; Banerjee, Imon; BioHealth Informatics, School of Informatics and ComputingTo curate a high-quality dataset, identifying data variance between the internal and external sources is a fundamental and crucial step. However, methods to detect shift or variance in data have not been significantly researched. Challenges to this are the lack of effective approaches to learn dense representation of a dataset and difficulties of sharing private data across medical institutions. To overcome the problems, we propose a unified pipeline called MedShift to detect the top-level shift samples and thus facilitate the medical curation. Given an internal dataset A as the base source, we first train anomaly detectors for each class of dataset A to learn internal distributions in an unsupervised way. Second, without exchanging data across sources, we run the trained anomaly detectors on an external dataset B for each class. The data samples with high anomaly scores are identified as shift data. To quantify the shiftness of the external dataset, we cluster B's data into groups class-wise based on the obtained scores. We then train a multi-class classifier on A and measure the shiftness with the classifier's performance variance on B by gradually dropping the group with the largest anomaly score for each class. Additionally, we adapt a dataset quality metric to help inspect the distribution differences for multiple medical sources. We verify the efficacy of MedShift with musculoskeletal radiographs (MURA) and chest X-rays datasets from more than one external source. Experiments show our proposed shift data detection pipeline can be beneficial for medical centers to curate high-quality datasets more efficiently. An interface introduction video to visualize our results is available at this URL.https://youtu.be/V3BF0P1sxQE.Item Multi-Label Medical Image Retrieval Via Learning Multi-Class Similarity(SSRN, 2022) Guo, Xiaoyuan; Duan, Jiali; Gichoya, Judy Wawira; Trivedi, Hari; Purkayastha, Saptarshi; Sharma, Ashish; Banerjee, Imon; BioHealth Informatics, School of Informatics and ComputingIntroduction: Multi-label image retrieval is a challenging problem in the medical area. First, compared to natural images, labels in the medical domain exhibit higher class-imbalance and much nuanced variations. Second, pair-based sampling for positives and negatives during similarity optimization are ambiguous in the multi-label setting, as samples with the same set of labels are limited. Methods: To address the aforementioned challenges, we propose a proxy-based multi-class similarity (PMS) framework, which compares and contrasts samples by comparing their similarities with the discovered proxies. In this way, samples of different sets of label attributes can be utilized and compared indirectly, without the need for complicated sampling. PMS learns a class-wise feature decomposition and maintains a memory bank for positive features from each class. The memory bank keeps track of the latest features, used to compute the class proxies. We compare samples based on their similarity distributions against the proxies, which provide a more stable mean against noise. Results: We benchmark over 10 popular metric learning baselines on two public chest X-ray datasets and experiments show consistent stability of our approach under both exact and non-exact match settings. Conclusions: We proposed a methodology for multi-label medical image retrieval and design a proxy-based multi-class similarity metric, which compares and contrasts samples based on their similarity distributions with respect to the class proxies. With no perquisites, the metrics can be applied to various multi-label medical image applications. The implementation code repository will be publicly available after acceptance.Item OSCARS: An Outlier-Sensitive Content-Based Radiography Retrieval System(arXiv, 2022) Guo, Xiaoyuan; Duan, Jiali; Purkayastha, Saptarshi; Trivedi, Hari; Gichoya, Judy Wawira; Banerjee, Imon; BioHealth Informatics, School of Informatics and ComputingImproving the retrieval relevance on noisy datasets is an emerging need for the curation of a large-scale clean dataset in the medical domain. While existing methods can be applied for class-wise retrieval (aka. inter-class), they cannot distinguish the granularity of likeness within the same class (aka. intra-class). The problem is exacerbated on medical external datasets, where noisy samples of the same class are treated equally during training. Our goal is to identify both intra/inter-class similarities for fine-grained retrieval. To achieve this, we propose an Outlier-Sensitive Content-based rAdiologhy Retrieval System (OSCARS), consisting of two steps. First, we train an outlier detector on a clean internal dataset in an unsupervised manner. Then we use the trained detector to generate the anomaly scores on the external dataset, whose distribution will be used to bin intra-class variations. Second, we propose a quadruplet (a, p, nintra, ninter) sampling strategy, where intra-class negatives nintra are sampled from bins of the same class other than the bin anchor a belongs to, while niner are randomly sampled from inter-classes. We suggest a weighted metric learning objective to balance the intra and inter-class feature learning. We experimented on two representative public radiography datasets. Experiments show the effectiveness of our approach.