IU Indianapolis ScholarWorks :: Browsing by Subject "Computer Vision"

Browsing by Subject "Computer Vision"

Now showing 1 - 10 of 18

AI on the Edge with CondenseNeXt: An Efficient Deep Neural Network for Devices with Constrained Computational Resources
(2021-08) Kalgaonkar, Priyank B.; El-Sharkawy, Mohamed A.; King, Brian S.; Rizkalla, Maher E.
Research work presented within this thesis propose a neoteric variant of deep convolutional neural network architecture, CondenseNeXt, designed specifically for ARM-based embedded computing platforms with constrained computational resources. CondenseNeXt is an improved version of CondenseNet, the baseline architecture whose roots can be traced back to ResNet. CondeseNeXt replaces group convolutions in CondenseNet with depthwise separable convolutions and introduces group-wise pruning, a model compression technique, to prune (remove) redundant and insignificant elements that either are irrelevant or do not affect performance of the network upon disposition. Cardinality, a new dimension to the existing spatial dimensions, and class-balanced focal loss function, a weighting factor inversely proportional to the number of samples, has been incorporated in order to relieve the harsh effects of pruning, into the design of CondenseNeXt’s algorithm. Furthermore, extensive analyses of this novel CNN architecture was performed on three benchmarking image datasets: CIFAR-10, CIFAR-100 and ImageNet by deploying the trained weight on to an ARM-based embedded computing platform: NXP BlueBox 2.0, for real-time image classification. The outputs are observed in real-time in RTMaps Remote Studio’s console to verify the correctness of classes being predicted. CondenseNeXt achieves state-of-the-art image classification performance on three benchmark datasets including CIFAR-10 (4.79% top-1 error), CIFAR-100 (21.98% top-1 error) and ImageNet (7.91% single model, single crop top-5 error), and up to 59.98% reduction in forward FLOPs compared to CondenseNet. CondenseNeXt can also achieve a final trained model size of 2.9 MB, however at the cost of 2.26% in accuracy loss. Thus, performing image classification on ARM-Based computing platforms without requiring a CUDA enabled GPU support, with outstanding efficiency.
Analysis of Latent Space Representations for Object Detection
(2024-08) Dale, Ashley Susan; Christopher, Lauren; King, Brian; Salama, Paul; Rizkalla, Maher
Deep Neural Networks (DNNs) successfully perform object detection tasks, and the Con- volutional Neural Network (CNN) backbone is a commonly used feature extractor before secondary tasks such as detection, classification, or segmentation. In a DNN model, the relationship between the features learned by the model from the training data and the features leveraged by the model during test and deployment has motivated the area of feature interpretability studies. The work presented here applies equally to white-box and black-box models and to any DNN architecture. The metrics developed do not require any information beyond the feature vector generated by the feature extraction backbone. These methods are therefore the first methods capable of estimating black-box model robustness in terms of latent space complexity and the first methods capable of examining feature representations in the latent space of black box models. This work contributes the following four novel methodologies and results. First, a method for quantifying the invariance and/or equivariance of a model using the training data shows that the representation of a feature in the model impacts model performance. Second, a method for quantifying an observed domain gap in a dataset using the latent feature vectors of an object detection model is paired with pixel-level augmentation techniques to close the gap between real and synthetic data. This results in an improvement in the model’s F1 score on a test set of outliers from 0.5 to 0.9. Third, a method for visualizing and quantifying similarities of the latent manifolds of two black-box models is used to correlate similar feature representation with increase success in the transferability of gradient-based attacks. Finally, a method for examining the global complexity of decision boundaries in black-box models is presented, where more complex decision boundaries are shown to correlate with increased model robustness to gradient-based and random attacks.
Computer Vision Based Robust Lane Detection Via Multiple Model Adaptive Estimation Technique
(2021-12) Fakhari, Iman; Anwar, Sohel; Al Hasan, Mohammad; Lingxi, Li
The lane-keeping system in autonomous vehicles (AV) or even as a part of the advanced driving assistant system (ADAS) is known as one of the primary options of AVs and ADAS. The developed lane-keeping systems work on either computer vision or deep learning algorithms for their lane detection section. However, even the strongest image processing units or the robust deep learning algorithms for lane detection have inaccuracies during lane detection under certain conditions. The source of these inaccuracies could be rainy or foggy weather, high contrast shades of buildings and objects on-street, or faded lines. Since the lane detection unit of these systems is responsible for controlling the steering, even a momentary loss of lane detection accuracy could result in an accident or failure. As mentioned, different lane detection algorithms have been presented based on computer vision and deep learning during the last few years, and each one has pros and cons. Each model may have a better performance in some situations and fail in others. For example, deep learning-based methods are vulnerable to new samples. In this research, multiple models of lane detection are evaluated and used together to implement a robust lane detection algorithm. The purpose of this research is to develop an estimator-based Multiple Model Adaptive Estimation (MMAE) algorithm on the lane-keeping system to improve the robustness of the lane detection system. To verify the performance of the implemented algorithm, the AirSim simulation environment was used. The test simulation vehicle was equipped with one front camera and one back camera used to implement the proposed algorithm. The front camera images are used for detecting the lane and the offset of the vehicle and center point of the lane. The rear camera, which offered better performance in lane detection, was used as an estimator for calculating the uncertainty of each model. The simulation results showed that combining two implemented models with MMAE performed robustly even in those case studies where one of the models failed. The proposed algorithm was able to detect the failures of either of the models and then switch to another good working model to improve the robustness of the lane detection system. However, the proposed algorithm had some limitations; it can be improved by replacing PID controller with an MPC controller in future studies. In addition, in the presented algorithm, two computer vision-based algorithms were used; however, adding a deep learning-based model could improve the performance of the proposed MMAE. To have a robust deep learning-based model, it is suggested to train the network based on AirSim output images. Otherwise, the network will not work accurately due to the differences in the camera's location, camera configuration, colors, and contrast.
Deep Image Processing with Spatial Adaptation and Boosted Efficiency & Supervision for Accurate Human Keypoint Detection and Movement Dynamics Tracking
(2023-05) Dai, Chao Yang; Zhang, Qingxue; King, Brian S.; Fang, Shiaofen
This thesis aims to design and develop the spatial adaptation approach through spatial transformers to improve the accuracy of human keypoint recognition models. We have studied different model types and design choices to gain an accuracy increase over models without spatial transformers and analyzed how spatial transformers increase the accuracy of predictions. A neural network called Widenet has been leveraged as a specialized network for providing the parameters for the spatial transformer. Further, we have evaluated methods to reduce the model parameters, as well as the strategy to enhance the learning supervision for further improving the performance of the model. Our experiments and results have shown that the proposed deep learning framework can effectively detect the human key points, compared with the baseline methods. Also, we have reduced the model size without significantly impacting the performance, and the enhanced supervision has improved the performance. This study is expected to greatly advance the deep learning of human key points and movement dynamics.
Design Space Exploration of Convolutional Neural Networks for Image Classification
(2020-12) Shah, Prasham; Mohamed, El-Sharkawy; King, Brian; Rizkalla, Maher
Computer vision is a domain which deals with the goal of making technology as efficient as human vision. To achieve that goal, after decades of research, researchers have developed algorithms that are able to work efficiently on resource constrained hardware like mobile or embedded devices for computer vision applications. Due to their constant efforts, such devices have become capable for tasks like Image Classification, Object Detection, Object Recognition, Semantic Segmentation, and many other applications. Autonomous systems like self-driving cars, Drones and UAVs, are being successfully developed because of these advances in AI. Deep Learning, a part of AI, is a specific domain of Machine Learning which focuses on developing algorithms for such applications. Deep Learning deals with tasks like extracting features from raw image data, replacing pipelines of specialized models with single end-to-end models, making models usable for multiple tasks with superior performance. A major focus is on techniques to detect and extract features which provide better context for inference about an image or video stream. A deep hierarchy of rich features can be learned and automatically extracted from images, provided by the multiple deep layers of CNN models. CNNs are the backbone of Computer Vision. The reason that CNNs are the focus of attention for deep learning models is that they were specifically designed for image data. They are complicated but very effective in extracting features from an image or a video stream. After AlexNet won the ILSVRC in 2012, there was a drastic increase in research related with CNNs. Many state-of-the-art architectures like VGG Net, GoogleNet, ResNet, Inception-v4, Inception-Resnet-v2, ShuffleNet, Xception, MobileNet, MobileNetV2, SqueezeNet, SqueezeNext and many more were introduced. The trend behind the research depicts an increase in the number of layers of CNN to make them more efficient but with that, the size of the model increased as well. This problem was fixed with the advent of new algorithms which resulted in a decrease in model size. As a result, today we have CNN models, which are implemented on mobile devices. These mobile models are compact and have low latency, which in turn reduces the computational cost of the embedded system. This thesis resembles similar idea, it proposes two new CNN architectures, A-MnasNet and R-MnasNet, which have been derived from MnasNet by Design Space Exploration. These architectures outperform MnasNet in terms of model size and accuracy. They have been trained and tested on CIFAR-10 dataset. Furthermore, they were implemented on NXP Bluebox 2.0, an autonomous driving platform, for Image Classification.
Design Space Exploration of Convolutional Neural Networks for Image Classification
(2020-12) Shah, Prasham; Mohamed, El-Sharkawy; King, Brian; Rizkalla, Maher
Computer vision is a domain which deals with the goal of making technology as efficient as human vision. To achieve that goal, after decades of research, researchers have developed algorithms that are able to work efficiently on resource constrained hardware like mobile or embedded devices for computer vision applications. Due to their constant efforts, such devices have become capable for tasks like Image Classification, Object Detection, Object Recognition, Semantic Segmentation, and many other applications. Autonomous systems like self-driving cars, Drones and UAVs, are being successfully developed because of these advances in AI. Deep Learning, a part of AI, is a specific domain of Machine Learning which focuses on developing algorithms for such applications. Deep Learning deals with tasks like extracting features from raw image data, replacing pipelines of specialized models with single end-to-end models, making models usable for multiple tasks with superior performance. A major focus is on techniques to detect and extract features which provide better context for inference about an image or video stream. A deep hierarchy of rich features can be learned and automatically extracted from images, provided by the multiple deep layers of CNN models. CNNs are the backbone of Computer Vision. The reason that CNNs are the focus of attention for deep learning models is that they were specifically designed for image data. They are complicated but very effective in extracting features from an image or a video stream. After AlexNet won the ILSVRC in 2012, there was a drastic increase in research related with CNNs. Many state-of-the-art architectures like VGG Net, GoogleNet, ResNet, Inception-v4, Inception-Resnet-v2, ShuffleNet, Xception, MobileNet, MobileNetV2, SqueezeNet, SqueezeNext and many more were introduced. The trend behind the research depicts an increase in the number of layers of CNN to make them more efficient but with that, the size of the model increased as well. This problem was fixed with the advent of new algorithms which resulted in a decrease in model size. As a result, today we have CNN models, which are implemented on mobile devices. These mobile models are compact and have low latency, which in turn reduces the computational cost of the embedded system. This thesis resembles similar idea, it proposes two new CNN architectures, A-MnasNet and R-MnasNet, which have been derived from MnasNet by Design Space Exploration. These architectures outperform MnasNet in terms of model size and accuracy. They have been trained and tested on CIFAR-10 dataset. Furthermore, they were implemented on NXP Bluebox 2.0, an autonomous driving platform, for Image Classification.
Detection and Localization of Root Damages in Underground Sewer Systems using Deep Neural Networks and Computer Vision Techniques
(2022-12) Zheng, Muzi; Fang, Shiaofen; Tuceryan, Mihran; Liang, Yao
The maintenance of a healthy sewer infrastructure is a major challenge due to the root damages from nearby plants that grow through pipe cracks or loose joints, which may lead to serious pipe blockages and collapse. Traditional inspections based on video surveillance to identify and localize root damages within such complex sewer networks are inefficient, laborious, and error-prone. Therefore, this study aims to develop a robust and efficient approach to automatically detect root damages and localize their circumferential and longitudinal positions in CCTV inspection videos by applying deep neural networks and computer vision techniques. With twenty inspection videos collected from various resources, keyframes were extracted from each video according to the difference in a LUV color space with certain selections of local maxima. To recognize distance information from video subtitles, OCR models such as Tesseract and CRNN-CTC were implemented and led to a 90% of recognition accuracy. In addition, a pre-trained segmentation model was applied to detect root damages, but it also found many false positive predictions. By applying a well-tuned YoloV3 model on the detection of pipe joints leveraging the Convex Hull Overlap (CHO) feature, we were able to achieve a 20% improvement on the reliability and accuracy of damage identifications. Moreover, an end-to-end deep learning pipeline that involved Triangle Similarity Theorem (TST) was successfully designed to predict the longitudinal position of each identified root damage. The prediction error was less than 1.0 feet.
Detection of histological features in liver biopsy images to help identify Non-Alcoholic Fatty Liver Disease
(2018-04-26) Sethunath, Deepak
This thesis explores a minimally invasive approach of diagnosing Non-Alcoholic Fatty Liver disease (NAFLD) on mice and humans which can be useful for pathologists while performing their diagnosis. NAFLD is a spectrum of diseases going from least severe to most severe – steatosis, steatohepatitis, fibrosis and finally cirrhosis. This disease primarily results from fat deposition in the liver which is unrelated to alcohol or viral causes. In general, it affects individuals having a combination of at least three of the five metabolic syndromes namely, obesity, hypertension, diabetes, hypertriglyceridemia, and hyperlipidemia. Given how common these metabolic syndromes have become, the rate of NAFLD has increased dramatically over the years affecting about three-quarters of all obese individuals including many children, making it one of the most common diseases in United States. Our study focuses on building various computational models which help identify different histological features in a liver biopsy image, thereby analyzing if a person is affected by NAFLD or not. Here, we develop and validate the performance of automated classifiers built using image processing and machine learning methods for detection of macro- and microsteatosis, lobular and portal inflammation and also categorize different types fibrosis in murine and human fatty liver disease and study the correlation of automated quantification of macrosteatosis, lobular and portal inflammation, and fibrosis (amount of collagen) with expert pathologist’s semi-quantitative grades. Our research for macrosteatosis and microsteatosis prediction shows the model’s precision and sensitivity as 94.2%, 95% for macrosteatosis and 79.2%, 77% for microsteatosis. Our models detect lobular and portal inflammation(s) with a precision, sensitivity of 79.6%, 77.1% for lobular inflammation and 86%, 90.4% for portal inflammation. We also present the first study on identification of the six different types of fibrosis having a precision of 85.6% for normal fibrosis and >70% for portal fibrosis, periportal fibrosis, pericellular fibrosis, bridging fibrosis and cirrhosis. We have also quantified the amount of collagen in a liver biopsy and compared it to the pathologist semi-quantitative fibrosis grade.
Detection of Stroke, Blood Vessel Landmarks, and Leptomeningeal Anastomoses in Mouse Brain Imaging
(2022-12) Zhang, Leqi; Christopher, Lauren A.; King, Brian; Salama, Paul
Collateral connections in the brain, also known as Leptomeningeal Anastomoses, are connections between blood vessels originating from different arteries. Despite limited knowledge, they are suggested as an important contributor to cerebral stroke recovery that allows additional blood flow through the affected area. However, few databases and algorithms exist for this specific task of locating them. In this paper, a MATLAB program is developed to find these connections and detect strokes to replace manual labeling by professionals. The limited data available for this study are 23 2D microscopy images of mice cerebral vascular structures highlighted by dyes. In the images, strokes are shown to diminish the pixel count of vessels below 80\% compared to the healthy brain. Stroke classification error is greatly reduced by narrowing the scope from comparing the entire hemisphere to one smaller region. A novel way of finding collateral connections is utilizing connected components. Connected components organize all adjacent pixels into a group. All collateral connections can be found on the border of two neighboring arterial flow regions, and belong to the same group of connected components with the arterial source from each side. Along with finding collateral connections, a newly created coordinate system allows regions to be defined relative to the brain landmarks, based on the brain's center, orientation, and scale. The method newly proposed in this paper combines stroke detection, brain coordinate system extraction, and collateral connection detection in stroke-affected mouse brains using only image processing techniques. This allows a simpler, more explainable result on limited data than other techniques such as supervised machine learning. In addition, the new method does not require ground truth and high image count for training. This automated process was successfully interpreted by medical experts, which allows for further research into automating collateral connection detection in 3D.
Enhanced Multiple Dense Layer EfficientNet
(2024-08) Mohan, Aswathy; El-Sharkawy, Mohamed; King , Brian; Rizkalla, Maher
In the dynamic and ever-evolving landscape of Artificial Intelligence (AI), the domain of deep learning has emerged as a pivotal force, propelling advancements across a broad spectrum of applications, notably in the intricate field of image classification. Image classification, a critical task that involves categorizing images into predefined classes, serves as the backbone for numerous cutting-edge technologies, including but not limited to, automated surveillance, facial recognition systems, and advanced diagnostics in healthcare. Despite the significant strides made in the area, the quest for models that not only excel in accuracy but also demonstrate robust generalization across varied datasets, and maintain resilience against the pitfalls of overfitting, remains a formidable challenge. EfficientNetB0, a model celebrated for its optimized balance between computational efficiency and accuracy, stands at the forefront of solutions addressing these challenges. However, the nuanced complexities of datasets such as CIFAR-10, characterized by its diverse array of images spanning ten distinct categories, call for specialized adaptations to harness the full potential of such sophisticated architectures. In response, this thesis introduces an optimized version of the EffciientNetB0 architecture, meticulously enhanced with strategic architectural modifications, including the incorporation of an additional Dense layer endowed with 512 units and the strategic use of Dropout regularization. These adjustments are designed to amplify the model’s capacity for learning and interpreting complex patterns inherent in the data. Complimenting these architectural refinements, a nuanced two-phase training methodology is also adopted in the proposed model. This approach commences with the initial phase of training where the base model’s pre-trained weights are frozen, thus leveraging the power of transfer learning to secure a solid foundational understanding. The subsequent phase of fine-tuning, characterized by the selective unfreezing of layers, meticulously calibrates the model to the intricacies of the CIFAR-10 dataset. This is further bolstered by the implementation of adaptive learning rate adjustments, ensuring the model’s training process is both efficient and responsive to the nuances of the learning curve. Through a comprehensive suite of evaluations, encompassing accuracy assessments, confusion matrices, and detailed classification reports, the proposed model demonstrates notable improvement in performance. The insights gleaned from this research not only shed light on the mechanisms underpinning successful image classification models but also chart a course for future aimed at bridging the gap between theoretical model and their practical applications. This research encapsulates the iterative process of model enhancement, providing a beacon of future endeavors in the quest for optimal image classification solutions.

Browsing by Subject "Computer Vision"

Results Per Page

Sort Options