- Browse by Subject
Browsing by Subject "DNN"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
Item AI on the Edge with CondenseNeXt: An Efficient Deep Neural Network for Devices with Constrained Computational Resources(2021-08) Kalgaonkar, Priyank B.; El-Sharkawy, Mohamed A.; King, Brian S.; Rizkalla, Maher E.Research work presented within this thesis propose a neoteric variant of deep convolutional neural network architecture, CondenseNeXt, designed specifically for ARM-based embedded computing platforms with constrained computational resources. CondenseNeXt is an improved version of CondenseNet, the baseline architecture whose roots can be traced back to ResNet. CondeseNeXt replaces group convolutions in CondenseNet with depthwise separable convolutions and introduces group-wise pruning, a model compression technique, to prune (remove) redundant and insignificant elements that either are irrelevant or do not affect performance of the network upon disposition. Cardinality, a new dimension to the existing spatial dimensions, and class-balanced focal loss function, a weighting factor inversely proportional to the number of samples, has been incorporated in order to relieve the harsh effects of pruning, into the design of CondenseNeXt’s algorithm. Furthermore, extensive analyses of this novel CNN architecture was performed on three benchmarking image datasets: CIFAR-10, CIFAR-100 and ImageNet by deploying the trained weight on to an ARM-based embedded computing platform: NXP BlueBox 2.0, for real-time image classification. The outputs are observed in real-time in RTMaps Remote Studio’s console to verify the correctness of classes being predicted. CondenseNeXt achieves state-of-the-art image classification performance on three benchmark datasets including CIFAR-10 (4.79% top-1 error), CIFAR-100 (21.98% top-1 error) and ImageNet (7.91% single model, single crop top-5 error), and up to 59.98% reduction in forward FLOPs compared to CondenseNet. CondenseNeXt can also achieve a final trained model size of 2.9 MB, however at the cost of 2.26% in accuracy loss. Thus, performing image classification on ARM-Based computing platforms without requiring a CUDA enabled GPU support, with outstanding efficiency.Item Design Space Exploration of MobileNet for Suitable Hardware Deployment(2020-05) Sinha, Debjyoti; El-Sharkawy, Mohamed; King, Brian; Rizkalla, MaherDesigning self-regulating machines that can see and comprehend various real world objects around it are the main purpose of the AI domain. Recently, there has been marked advancements in the field of deep learning to create state-of-the-art DNNs for various CV applications. It is challenging to deploy these DNNs into resource-constrained micro-controller units as often they are quite memory intensive. Design Space Exploration is a technique which makes CNN/DNN memory efficient and more flexible to be deployed into resource-constrained hardware. MobileNet is small DNN architecture which was designed for embedded and mobile vision, but still researchers faced many challenges in deploying this model into resource limited real-time processors. This thesis, proposes three new DNN architectures, which are developed using the Design Space Exploration technique. The state-of-the art MobileNet baseline architecture is used as foundation to propose these DNN architectures in this study. They are enhanced versions of the baseline MobileNet architecture. DSE techniques like data augmentation, architecture tuning, and architecture modification have been done to improve the baseline architecture. First, the Thin MobileNet architecture is proposed which uses more intricate block modules as compared to the baseline MobileNet. It is a compact, efficient and flexible architecture with good model accuracy. To get a more compact models, the KilobyteNet and the Ultra-thin MobileNet DNN architecture is proposed. Interesting techniques like channel depth alteration and hyperparameter tuning are introduced along-with some of the techniques used for designing the Thin MobileNet. All the models are trained and validated from scratch on the CIFAR-10 dataset. The experimental results (training and testing) can be visualized using the live accuracy and logloss graphs provided by the Liveloss package. The Ultra-thin MobileNet model is more balanced in terms of the model accuracy and model size out of the three and hence it is deployed into the NXP i.MX RT1060 embedded hardware unit for image classification application.Item Enhanced 3D Object Detection and Tracking in Autonomous Vehicles: An Efficient Multi-Modal Deep Fusion Approach(2024-08) Kalgaonkar, Priyank B.; El-Sharkawy, Mohamed; King, Brian S.; Rizkalla, Maher E.; Abdallah, Mustafa A.This dissertation delves into a significant challenge for Autonomous Vehicles (AVs): achieving efficient and robust perception under adverse weather and lighting conditions. Systems that rely solely on cameras face difficulties with visibility over long distances, while radar-only systems struggle to recognize features like stop signs, which are crucial for safe navigation in such scenarios. To overcome this limitation, this research introduces a novel deep camera-radar fusion approach using neural networks. This method ensures reliable AV perception regardless of weather or lighting conditions. Cameras, similar to human vision, are adept at capturing rich semantic information, whereas radars can penetrate obstacles like fog and darkness, similar to X-ray vision. The thesis presents NeXtFusion, an innovative and efficient camera-radar fusion network designed specifically for robust AV perception. Building on the efficient single-sensor NeXtDet neural network, NeXtFusion significantly enhances object detection accuracy and tracking. A notable feature of NeXtFusion is its attention module, which refines critical feature representation for object detection, minimizing information loss when processing data from both cameras and radars. Extensive experiments conducted on large-scale datasets such as Argoverse, Microsoft COCO, and nuScenes thoroughly evaluate the capabilities of NeXtDet and NeXtFusion. The results show that NeXtFusion excels in detecting small and distant objects compared to existing methods. Notably, NeXtFusion achieves a state-of-the-art mAP score of 0.473 on the nuScenes validation set, outperforming competitors like OFT by 35.1% and MonoDIS by 9.5%. NeXtFusion's excellence extends beyond mAP scores. It also performs well in other crucial metrics, including mATE (0.449) and mAOE (0.534), highlighting its overall effectiveness in 3D object detection. Visualizations of real-world scenarios from the nuScenes dataset processed by NeXtFusion provide compelling evidence of its capability to handle diverse and challenging environments.Item Squeeze-and-Excitation SqueezeNext: An Efficient DNN for Hardware Deployment(2020-05) Chappa, Naga Venkata Sai Raviteja; Sharkaway, Mohamed L.; King, Brian; Rizkalla, MaherConvolution neural network is being used in field of autonomous driving vehicles or driver assistance systems (ADAS), and has achieved great success. Before the convolution neural network, traditional machine learning algorithms helped the driver assistance systems. Currently, there is a great exploration being done in architectures like MobileNet, SqueezeNext & SqueezeNet. It improved the CNN architectures and made it more suitable to implement on real-time embedded systems. This thesis proposes an efficient and a compact CNN to ameliorate the performance of existing CNN architectures. The intuition behind this proposed architecture is to supplant convolution layers with a more sophisticated block module and to develop a compact architecture with a competitive accuracy. Further, explores the bottleneck module and squeezenext basic block structure. The state-of-the-art squeezenext baseline architecture is used as a foundation to recreate and propose a high performance squeezenext architecture. The proposed architecture is further trained on the CIFAR-10 dataset from scratch. All the training and testing results are visualized with live loss and accuracy graphs. Focus of this thesis is to make an adaptable and a flexible model for efficient CNN performance which can perform better with the minimum tradeoff between model accuracy, size, and speed. Having a model size of 0.595MB along with accuracy of 92.60% and with a satisfactory training and validating speed of 9 seconds, this model can be deployed on real-time autonomous system platform such as Bluebox 2.0 by NXP.