- Browse by Author
Browsing by Author "Shah, Prasham"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
Item A-MnasNet and Image Classification on NXP Bluebox 2.0(ASTES, 2021-01) Shah, Prasham; El-Sharkawy, Mohamed; Electrical and Computer Engineering, School of Engineering and TechnologyComputer Vision is a domain which deals with the challenge of enabling technology with vision capabilities. This goal is accomplished with the use of Convolutional Neural Networks. They are the backbone of implementing vision applications on embedded systems. They are complex but highly efficient in extracting features, thus, enabling embedded systems to perform computer vision applications. After AlexNet won the ImageNet Large Scale Visual Recognition Challenge in 2012, there was a drastic increase in research on Convolutional Neural Networks. The convolutional neural networks were made deeper and wider, in order to make them more efficient. They were able to extract features efficiently, but the computational complexity and the computational cost of those networks also increased. It became very challenging to deploy such networks on embedded hardware. Since embedded systems have limited resources like power, speed and computational capabilities, researchers got more inclined towards the goal of making convolutional neural networks more compact, with efficiency of extracting features similar to that of the novel architectures. This research has a similar goal of proposing a convolutional neural network with enhanced efficiency and further using it for a vision application like Image Classification on NXP Bluebox 2.0, an autonomous driving platform by NXP Semiconductors. This paper gives an insight on the Design Space Exploration technique used to propose A-MnasNet (Augmented MnasNet) architecture, with enhanced capabilities, from MnasNet architecture. Furthermore, it explains the implementation of A-MnasNet on Bluebox 2.0 for Image Classification.Item A-MnasNet: Augmented MnasNet for Computer Vision(IEEE, 2020-08) Shah, Prasham; El-Sharkawy, Mohamed; Electrical and Computer Engineering, School of Engineering and TechnologyConvolutional Neural Networks (CNNs) play an essential role in Deep Learning. They are extensively used in Computer Vision. They are complicated but very effective in extracting features from an image or a video stream. After AlexNet [5] won the ILSVRC [8] in 2012, there was a drastic increase in research related with CNNs. Many state-of-the-art architectures like VGG Net [12], GoogleNet [13], ResNet [18], Inception-v4 [14], Inception-Resnet-v2 [14], ShuffleNet [23], Xception [24], MobileNet [6], MobileNetV2 [7], SqueezeNet [16], SqueezeNext [17] and many more were introduced. The trend behind the research depicts an increase in the number of layers of CNN to make them more efficient but with that the size of the model increased as well. This problem was fixed with the advent of new algorithms which resulted in a decrease in model size. As a result, today we have CNN models which are implemented on mobile devices. These mobile models are small and fast which in turn reduce the computational cost of the embedded system. This paper resembles similar idea, it proposes a new model Augmented MnasNet (A-MnasNet) which has been derived from MnasNet [1]. The model is trained with CIFAR-10 [4] dataset and has a validation accuracy of 96.89% and a model size of 11.6 MB. It outperforms its baseline architecture MnasNet which has a validation accuracy of 80.8% and a model size of 12.7 MB when trained with CIFAR-10.Item Design Space Exploration of Convolutional Neural Networks for Image Classification(2020-12) Shah, Prasham; Mohamed, El-Sharkawy; King, Brian; Rizkalla, MaherComputer vision is a domain which deals with the goal of making technology as efficient as human vision. To achieve that goal, after decades of research, researchers have developed algorithms that are able to work efficiently on resource constrained hardware like mobile or embedded devices for computer vision applications. Due to their constant efforts, such devices have become capable for tasks like Image Classification, Object Detection, Object Recognition, Semantic Segmentation, and many other applications. Autonomous systems like self-driving cars, Drones and UAVs, are being successfully developed because of these advances in AI. Deep Learning, a part of AI, is a specific domain of Machine Learning which focuses on developing algorithms for such applications. Deep Learning deals with tasks like extracting features from raw image data, replacing pipelines of specialized models with single end-to-end models, making models usable for multiple tasks with superior performance. A major focus is on techniques to detect and extract features which provide better context for inference about an image or video stream. A deep hierarchy of rich features can be learned and automatically extracted from images, provided by the multiple deep layers of CNN models. CNNs are the backbone of Computer Vision. The reason that CNNs are the focus of attention for deep learning models is that they were specifically designed for image data. They are complicated but very effective in extracting features from an image or a video stream. After AlexNet won the ILSVRC in 2012, there was a drastic increase in research related with CNNs. Many state-of-the-art architectures like VGG Net, GoogleNet, ResNet, Inception-v4, Inception-Resnet-v2, ShuffleNet, Xception, MobileNet, MobileNetV2, SqueezeNet, SqueezeNext and many more were introduced. The trend behind the research depicts an increase in the number of layers of CNN to make them more efficient but with that, the size of the model increased as well. This problem was fixed with the advent of new algorithms which resulted in a decrease in model size. As a result, today we have CNN models, which are implemented on mobile devices. These mobile models are compact and have low latency, which in turn reduces the computational cost of the embedded system. This thesis resembles similar idea, it proposes two new CNN architectures, A-MnasNet and R-MnasNet, which have been derived from MnasNet by Design Space Exploration. These architectures outperform MnasNet in terms of model size and accuracy. They have been trained and tested on CIFAR-10 dataset. Furthermore, they were implemented on NXP Bluebox 2.0, an autonomous driving platform, for Image Classification.Item Design Space Exploration of Convolutional Neural Networks for Image Classification(2020-12) Shah, Prasham; Mohamed, El-Sharkawy; King, Brian; Rizkalla, MaherComputer vision is a domain which deals with the goal of making technology as efficient as human vision. To achieve that goal, after decades of research, researchers have developed algorithms that are able to work efficiently on resource constrained hardware like mobile or embedded devices for computer vision applications. Due to their constant efforts, such devices have become capable for tasks like Image Classification, Object Detection, Object Recognition, Semantic Segmentation, and many other applications. Autonomous systems like self-driving cars, Drones and UAVs, are being successfully developed because of these advances in AI. Deep Learning, a part of AI, is a specific domain of Machine Learning which focuses on developing algorithms for such applications. Deep Learning deals with tasks like extracting features from raw image data, replacing pipelines of specialized models with single end-to-end models, making models usable for multiple tasks with superior performance. A major focus is on techniques to detect and extract features which provide better context for inference about an image or video stream. A deep hierarchy of rich features can be learned and automatically extracted from images, provided by the multiple deep layers of CNN models. CNNs are the backbone of Computer Vision. The reason that CNNs are the focus of attention for deep learning models is that they were specifically designed for image data. They are complicated but very effective in extracting features from an image or a video stream. After AlexNet won the ILSVRC in 2012, there was a drastic increase in research related with CNNs. Many state-of-the-art architectures like VGG Net, GoogleNet, ResNet, Inception-v4, Inception-Resnet-v2, ShuffleNet, Xception, MobileNet, MobileNetV2, SqueezeNet, SqueezeNext and many more were introduced. The trend behind the research depicts an increase in the number of layers of CNN to make them more efficient but with that, the size of the model increased as well. This problem was fixed with the advent of new algorithms which resulted in a decrease in model size. As a result, today we have CNN models, which are implemented on mobile devices. These mobile models are compact and have low latency, which in turn reduces the computational cost of the embedded system. This thesis resembles similar idea, it proposes two new CNN architectures, A-MnasNet and R-MnasNet, which have been derived from MnasNet by Design Space Exploration. These architectures outperform MnasNet in terms of model size and accuracy. They have been trained and tested on CIFAR-10 dataset. Furthermore, they were implemented on NXP Bluebox 2.0, an autonomous driving platform, for Image Classification.Item R-MnasNet: Reduced MnasNet for Computer Vision(IEEE, 2020-09) Shah, Prasham; El-Sharkawy, Mohamed; Electrical and Computer Engineering, School of Engineering and TechnologyIn Deep Learning, Convolutional Neural Networks (CNNs) are widely used for Computer Vision applications. With the advent of new technology, there is an inevitable necessity for CNNs to be computationally less expensive. It has become a key factor in determining its competence. CNN models must be compact in size and work efficiently when deployed on embedded systems. In order to achieve this goal, researchers have invented new algorithms which make CNNs lightweight yet accurate enough to be used for applications like object detection. In this paper, we have tried to do the same by modifying an architecture to make it compact with a fair trade-off between model size and accuracy. A new architecture, R-MnasNet (Reduced MnasNet), has been introduced which has a model size of 3 MB. It is trained on CIFAR-10 [4] and has a validation accuracy of 91.13%. Whereas the baseline architecture, MnasNet [1], has a model size of 12.7 MB with a validation accuracy of 80.8% when trained with CIFAR-10 dataset. R-MnasNet can be used on resource-constrained devices. It can be deployed on embedded systems for vision applications.