- Browse by Subject
Browsing by Subject "computer vision"
Now showing 1 - 10 of 11
Results Per Page
Sort Options
Item A-MnasNet: Augmented MnasNet for Computer Vision(IEEE, 2020-08) Shah, Prasham; El-Sharkawy, Mohamed; Electrical and Computer Engineering, School of Engineering and TechnologyConvolutional Neural Networks (CNNs) play an essential role in Deep Learning. They are extensively used in Computer Vision. They are complicated but very effective in extracting features from an image or a video stream. After AlexNet [5] won the ILSVRC [8] in 2012, there was a drastic increase in research related with CNNs. Many state-of-the-art architectures like VGG Net [12], GoogleNet [13], ResNet [18], Inception-v4 [14], Inception-Resnet-v2 [14], ShuffleNet [23], Xception [24], MobileNet [6], MobileNetV2 [7], SqueezeNet [16], SqueezeNext [17] and many more were introduced. The trend behind the research depicts an increase in the number of layers of CNN to make them more efficient but with that the size of the model increased as well. This problem was fixed with the advent of new algorithms which resulted in a decrease in model size. As a result, today we have CNN models which are implemented on mobile devices. These mobile models are small and fast which in turn reduce the computational cost of the embedded system. This paper resembles similar idea, it proposes a new model Augmented MnasNet (A-MnasNet) which has been derived from MnasNet [1]. The model is trained with CIFAR-10 [4] dataset and has a validation accuracy of 96.89% and a model size of 11.6 MB. It outperforms its baseline architecture MnasNet which has a validation accuracy of 80.8% and a model size of 12.7 MB when trained with CIFAR-10.Item CondenseNeXtV2: Light-Weight Modern Image Classifier Utilizing Self-Querying Augmentation Policies(MDPI, 2022) Kalgaonkar, Priyank; El-Sharkawy, Mohamed; Electrical and Computer Engineering, School of Engineering and TechnologyArtificial Intelligence (AI) combines computer science and robust datasets to mimic natural intelligence demonstrated by human beings to aid in problem-solving and decision-making involving consciousness up to a certain extent. From Apple’s virtual personal assistant, Siri, to Tesla’s self-driving cars, research and development in the field of AI is progressing rapidly along with privacy concerns surrounding the usage and storage of user data on external servers which has further fueled the need of modern ultra-efficient AI networks and algorithms. The scope of the work presented within this paper focuses on introducing a modern image classifier which is a light-weight and ultra-efficient CNN intended to be deployed on local embedded systems, also known as edge devices, for general-purpose usage. This work is an extension of the award-winning paper entitled ‘CondenseNeXt: An Ultra-Efficient Deep Neural Network for Embedded Systems’ published for the 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC). The proposed neural network dubbed CondenseNeXtV2 utilizes a new self-querying augmentation policy technique on the target dataset along with adaption to the latest version of PyTorch framework and activation functions resulting in improved efficiency in image classification computation and accuracy. Finally, we deploy the trained weights of CondenseNeXtV2 on NXP BlueBox which is an edge device designed to serve as a development platform for self-driving cars, and conclusions will be extrapolated accordingly.Item Deep Learning Based Crop Row Detection(2022-05) Doha, Rashed Mohammad; Anwar, Sohel; Al Hasan, Mohammad; Li, LingxiDetecting crop rows from video frames in real time is a fundamental challenge in the field of precision agriculture. Deep learning based semantic segmentation method, namely U-net, although successful in many tasks related to precision agriculture, performs poorly for solving this task. The reasons include paucity of large scale labeled datasets in this domain, diversity in crops, and the diversity of appearance of the same crops at various stages of their growth. In this work, we discuss the development of a practical real-life crop row detection system in collaboration with an agricultural sprayer company. Our proposed method takes the output of semantic segmentation using U-net, and then apply a clustering based probabilistic temporal calibration which can adapt to different fields and crops without the need for retraining the network. Experimental results validate that our method can be used for both refining the results of the U-net to reduce errors and also for frame interpolation of the input video stream. Upon the availability of more labeled data, we switched our approach from a semi-supervised model to a fully supervised end-to-end crop row detection model using a Feature Pyramid Network or FPN. Central to the FPN is a pyramid pooling module that extracts features from the input image at multiple resolutions. This results in the network’s ability to use both local and global features in classifying pixels to be crop rows. After training the FPN on the labeled dataset, our method obtained a mean IoU or Jaccard Index score of over 70% as reported on the test set. We trained our method on only a subset of the corn dataset and tested its performance on multiple variations of weed pressure and crop growth stages to verify that the performance does translate over the variations and is consistent across the entire dataset.Item Image Classification with CondenseNeXt for ARM-Based Computing Platforms(IEEE, 2021-04) Kalgaonkar, Priyank; El-Sharkawy, Mohamed; Electrical and Computer Engineering, School of Engineering and TechnologyIn this paper, we demonstrate the implementation of our ultra-efficient deep convolutional neural network architecture: CondenseNeXt on NXP BlueBox, an autonomous driving development platform developed for self-driving vehicles. We show that CondenseNeXt is remarkably efficient in terms of FLOPs, designed for ARM-based embedded computing platforms with limited computational resources and can perform image classification without the need of a CUDA enabled GPU. CondenseNeXt utilizes the state-of-the-art depthwise separable convolution and model compression techniques to achieve a remarkable computational efficiency. Extensive analyses are conducted on CIFAR-10, CIFAR-100 and ImageNet datasets to verify the performance of Con-denseNeXt Convolutional Neural Network (CNN) architecture. It achieves state-of-the-art image classification performance on three benchmark datasets including CIFAR-10 (4.79% top-1 error), CIFAR-100 (21.98% top-1 error) and ImageNet (7.91% single model, single crop top-5 error). CondenseNeXt achieves final trained model size improvement of 2.9+ MB and up to 59.98% reduction in forward FLOPs compared to CondenseNet and can perform image classification on ARM-Based computing platforms without needing a CUDA enabled GPU support, with outstanding efficiency.Item Leveraging the Invariant Side of Generative Zero-Shot Learning(IEEE, 2019) Li, Jingjing; Jing, Mengmeng; Lu, Ke; Ding, Zhengming; Zhu, Lei; Huang, Zi; Electrical and Computer Engineering, School of Engineering and TechnologyConventional zero-shot learning (ZSL) methods generally learn an embedding, e.g., visual-semantic mapping, to handle the unseen visual samples via an indirect manner. In this paper, we take the advantage of generative adversarial networks (GANs) and propose a novel method, named leveraging invariant side GAN (LisGAN), which can directly generate the unseen features from random noises which are conditioned by the semantic descriptions. Specifically, we train a conditional Wasserstein GANs in which the generator synthesizes fake unseen features from noises and the discriminator distinguishes the fake from real via a minimax game. Considering that one semantic description can correspond to various synthesized visual samples, and the semantic description, figuratively, is the soul of the generated features, we introduce soul samples as the invariant side of generative zero-shot learning in this paper. A soul sample is the meta-representation of one class. It visualizes the most semantically-meaningful aspects of each sample in the same category. We regularize that each generated sample (the varying side of generative ZSL) should be close to at least one soul sample (the invariant side) which has the same class label with it. At the zero-shot recognition stage, we propose to use two classifiers, which are deployed in a cascade way, to achieve a coarse-to-fine result. Experiments on five popular benchmarks verify that our proposed approach can outperform state-of-the-art methods with significant improvements.Item Pedestrian Detection based on Clustered Poselet Models and Hierarchical And-Or Grammar(IEEE, 2015-04) Li, Bo; Chen, Yaobin; Wang, Fei-Yue; Department of Electrical and Computer Engineering, Purdue School of Engineering and TechnologyIn this paper, a novel part-based pedestrian detection algorithm is proposed for complex traffic surveillance environments. To capture posture and articulation variations of pedestrians, we define a hierarchical grammar model with the and-or graphical structure to represent the decomposition of pedestrians. Thus, pedestrian detection is converted to a parsing problem. Next, we propose clustered poselet models, which use the affinity propagation clustering algorithm to automatically select representative pedestrian part patterns in keypoint space. Trained clustered poselets are utilized as the terminal part models in the grammar model. Finally, after all clustered poselet activations in the input image are detected, one bottom-up inference is performed to effectively search maximum a posteriori (MAP) solutions in the grammar model. Thus, consistent poselet activations are combined into pedestrian hypotheses, and their bounding boxes are predicted. Both appearance scores and geometry constraints among pedestrian parts are considered in inference. A series of experiments is conducted on images, both from the public TUD-Pedestrian data set and collected in real traffic crossing scenarios. The experimental results demonstrate that our algorithm outperforms other successful approaches with high reliability and robustness in complex environments.Item R-MnasNet: Reduced MnasNet for Computer Vision(IEEE, 2020-09) Shah, Prasham; El-Sharkawy, Mohamed; Electrical and Computer Engineering, School of Engineering and TechnologyIn Deep Learning, Convolutional Neural Networks (CNNs) are widely used for Computer Vision applications. With the advent of new technology, there is an inevitable necessity for CNNs to be computationally less expensive. It has become a key factor in determining its competence. CNN models must be compact in size and work efficiently when deployed on embedded systems. In order to achieve this goal, researchers have invented new algorithms which make CNNs lightweight yet accurate enough to be used for applications like object detection. In this paper, we have tried to do the same by modifying an architecture to make it compact with a fair trade-off between model size and accuracy. A new architecture, R-MnasNet (Reduced MnasNet), has been introduced which has a model size of 3 MB. It is trained on CIFAR-10 [4] and has a validation accuracy of 91.13%. Whereas the baseline architecture, MnasNet [1], has a model size of 12.7 MB with a validation accuracy of 80.8% when trained with CIFAR-10 dataset. R-MnasNet can be used on resource-constrained devices. It can be deployed on embedded systems for vision applications.Item Residual Capsule Network(IEEE, 2019-10) Bhamidi, Sree Bala Shruthi; El-Sharkawy, Mohamed; Electrical and Computer Engineering, School of Engineering and TechnologyConvolution Neural Network (CNN) has been the most influential innovations in the filed of Computer Vision. CNN have shown a substantial improvement in the field of Machine Learning. But they do come with their own set of drawbacks - CNN need a large dataset, hyperparameter tuning is nontrivial and importantly, they lose all the internal information about pose and transformation to pooling. Capsule Networks have addressed the limitations of CNNs and have shown a great improvement by calculating the pose and transformation of the image. On the other hand, deeper networks are more powerful than shallow networks but at the same time, more difficult to train. Simply adding layers to make the network deep has led to vanishing gradient problem. Residual Networks introduce skip connections to ease the training and have shown evidence that they can give good accuracy with considerable depth. Putting the best of Capsule Network and Residual Network together, we present Residual Capsule Network, a framework that uses the best features of both Residual and Capsule Networks. In the proposed model, the conventional Convolutional layer in Capsule Network is replaced by skip connections like the Residual Networks to decrease the complexity of the Baseline Capsule Network and seven ensemble Capsule Network. We trained our model on MNIST and CIFAR-10 datasets and have noted a significant decrease in the number of parameters when compared to the Baseline models.Item Sequential Semantic Segmentation of Road Profiles for Path and Speed Planning(IEEE, 2022-12) Cheng, Guo; Yu Zheng, Jiang; Computer and Information Science, School of ScienceDriving video is available from in-car camera for road detection and collision avoidance. However, consecutive video frames in a large volume have redundant scene coverage during vehicle motion, which hampers real-time perception in autonomous driving. This work utilizes compact road profiles (RP) and motion profiles (MP) to identify path regions and dynamic objects, which drastically reduces video data to a lower dimension and increases sensing rate. To avoid collision in a close range and navigate a vehicle in middle and far ranges, several RP/MPs are scanned continuously from different depths for vehicle path planning. We train deep network to implement semantic segmentation of RP in the spatial-temporal domain, in which we further propose a temporally shifting memory for online testing. It sequentially segments every incoming line without latency by referring to a temporal window. In streaming-mode, our method generates real-time output of road, roadsides, vehicles, pedestrians, etc. at discrete depths for path planning and speed control. We have experimented our method on naturalistic driving videos under various weather and illumination conditions. It reached the highest efficiency with the least amount of data.Item A Transfer Learning Approach to Object Detection Acceleration for Embedded Applications(2021-08) Vance, Lauren M.; Christopher, Lauren; King, Brian; Rizkalla, MaherDeep learning solutions to computer vision tasks have revolutionized many industries in recent years, but embedded systems have too many restrictions to take advantage of current state-of-the-art configurations. Typical embedded processor hardware configurations must meet very low power and memory constraints to maintain small and lightweight packaging, and the architectures of the current best deep learning models are too computationally-intensive for these hardware configurations. Current research shows that convolutional neural networks (CNNs) can be deployed with a few architectural modifications on Field-Programmable Gate Arrays (FPGAs) resulting in minimal loss of accuracy, similar or decreased processing speeds, and lower power consumption when compared to general-purpose Central Processing Units (CPUs) and Graphics Processing Units (GPUs). This research contributes further to these findings with the FPGA implementation of a YOLOv4 object detection model that was developed with the use of transfer learning. The transfer-learned model uses the weights of a model pre-trained on the MS-COCO dataset as a starting point then fine-tunes only the output layers for detection on more specific objects of five classes. The model architecture was then modified slightly for compatibility with the FPGA hardware using techniques such as weight quantization and replacing unsupported activation layer types. The model was deployed on three different hardware setups (CPU, GPU, FPGA) for inference on a test set of 100 images. It was found that the FPGA was able to achieve real-time inference speeds of 33.77 frames-per-second, a speedup of 7.74 frames-per-second when compared to GPU deployment. The model also consumed 96% less power than a GPU configuration with only approximately 4% average loss in accuracy across all 5 classes. The results are even more striking when compared to CPU deployment, with 131.7-times speedup in inference throughput. CPUs have long since been outperformed by GPUs for deep learning applications but are used in most embedded systems. These results further illustrate the advantages of FPGAs for deep learning inference on embedded systems even when transfer learning is used for an efficient end-to-end deployment process. This work advances current state-of-the-art with the implementation of a YOLOv4 object detection model developed with transfer learning for FPGA deployment.