- Browse by Subject
Browsing by Subject "Neural Networks"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
Item A Multi-head Attention Approach with Complementary Multimodal Fusion for Vehicle Detection(2024-05) Tabassum, Nujhat; El-Sharkawy, Mohamed; King, Brian; Rizkalla, MaherThe advancement of autonomous vehicle technologies has taken a significant leap with the development of an improved version of the Multimodal Vehicle Detection Network (MVDNet), distinguished by the integration of a multi-head attention layer. This key enhancement significantly refines the network's capability to process and integrate multimodal sensor data, an aspect that becomes crucial in the face of challenging weather conditions. The effectiveness of this upgraded Multi-Head MVDNet is rigorously verified through an extensive dataset acquired from the Oxford Radar Robotcar, demonstrating its enhanced performance capabilities. Notably, in complex environmental conditions, the Multi-Head MVDNet shows a marked superiority in terms of Average Precision (AP) compared to existing models, underscoring its advanced detection capabilities. The transition from the traditional MVDNet to the enhanced Multi-Head Vehicle Detection Network signifies a notable breakthrough in the arena of vehicle detection technologies, with a special emphasis on operation under severe meteorological conditions, such as the obscuring presence of dense fog or the complexities introduced by heavy snowfall. This significant enhancement capitalizes on the foundational principles of the original MVDNet, which skillfully amalgamates the individual strengths of lidar and radar sensors. This is achieved through an intricate and refined process of feature tensor fusion, creating a more robust and comprehensive sensory data interpretation framework. A major innovation introduced in this updated model is the implementation of a multi-head attention layer. This layer serves as a sophisticated replacement for the previously employed self-attention mechanism. Segmenting the attention mechanism into several distinct partitions enhances the network's efficiency and accuracy in processing and interpreting vast arrays of sensor data. An exhaustive series of experimental analyses was undertaken to determine the optimal configuration of this multi-head attention mechanism. These experiments explored various combinations and settings, ultimately identifying a configuration consisting of seven distinct attention heads as the most effective. This setup was found to optimize the balance between computational efficiency and detection accuracy. When tested using the rich radar and lidar datasets from the ORR project, this advanced Multi-Head MVDNet configuration consistently demonstrated its superiority. It not only surpassed the performance of the original MVDNet but also showed marked improvements over models that relied solely on lidar data or the DEF models, especially in terms of vehicular detection accuracy. This enhancement in the MVDNet model, with its focus on multi-head attention, not only represents a significant leap in the field of autonomous vehicle detection but also lays a foundation for future research. It opens new pathways for exploring various attention mechanisms and their potential applicability in scenarios requiring real-time vehicle detection. Furthermore, it accentuates the importance of sophisticated sensor fusion techniques as vital tools in overcoming the challenges posed by adverse environmental conditions, thus paving the way for more resilient and reliable autonomous vehicular technologies.Item AI on the Edge with CondenseNeXt: An Efficient Deep Neural Network for Devices with Constrained Computational Resources(2021-08) Kalgaonkar, Priyank B.; El-Sharkawy, Mohamed A.; King, Brian S.; Rizkalla, Maher E.Research work presented within this thesis propose a neoteric variant of deep convolutional neural network architecture, CondenseNeXt, designed specifically for ARM-based embedded computing platforms with constrained computational resources. CondenseNeXt is an improved version of CondenseNet, the baseline architecture whose roots can be traced back to ResNet. CondeseNeXt replaces group convolutions in CondenseNet with depthwise separable convolutions and introduces group-wise pruning, a model compression technique, to prune (remove) redundant and insignificant elements that either are irrelevant or do not affect performance of the network upon disposition. Cardinality, a new dimension to the existing spatial dimensions, and class-balanced focal loss function, a weighting factor inversely proportional to the number of samples, has been incorporated in order to relieve the harsh effects of pruning, into the design of CondenseNeXt’s algorithm. Furthermore, extensive analyses of this novel CNN architecture was performed on three benchmarking image datasets: CIFAR-10, CIFAR-100 and ImageNet by deploying the trained weight on to an ARM-based embedded computing platform: NXP BlueBox 2.0, for real-time image classification. The outputs are observed in real-time in RTMaps Remote Studio’s console to verify the correctness of classes being predicted. CondenseNeXt achieves state-of-the-art image classification performance on three benchmark datasets including CIFAR-10 (4.79% top-1 error), CIFAR-100 (21.98% top-1 error) and ImageNet (7.91% single model, single crop top-5 error), and up to 59.98% reduction in forward FLOPs compared to CondenseNet. CondenseNeXt can also achieve a final trained model size of 2.9 MB, however at the cost of 2.26% in accuracy loss. Thus, performing image classification on ARM-Based computing platforms without requiring a CUDA enabled GPU support, with outstanding efficiency.Item Lidar Based 3D Object Detection Using Yolov8(2024-08) Menon, Swetha Suresh; El-Sharkawy, Mohamed; King, Brian; Rizkalla, MaherAutonomous vehicles have gained substantial traction as the future of transportation, necessitating continuous research and innovation. While 2D object detection and instance segmentation methods have made significant strides, 3D object detection offers unparalleled precision. Deep neural network-based 3D object detection, coupled with sensor fusion, has become indispensable for self-driving vehicles, enabling a comprehensive grasp of the spatial geometry of physical objects. In our study of a Lidar-based 3D object detection network using point clouds, we propose a novel architectural model based on You Only Look Once (YOLO) framework. This innovative model combines the efficiency and accuracy of the YOLOv8 network, a swift 2D standard object detector, and a state-of-the-art model, with the real-time 3D object detection capability of the Complex YOLO model. By integrating the YOLOv8 model as the backbone network and employing the Euler Region Proposal (ERP) method, our approach achieves rapid inference speeds, surpassing other object detection models while upholding high accuracy standards. Our experiments, conducted on the KITTI dataset, demonstrate the superior efficiency of our new architectural model. It outperforms its predecessors, showcasing its prowess in advancing the field of 3D object detection in autonomous vehicles.Item A Machine Learning Model for Average Fuel Consumption in Heavy Vehicles(IEEE, 2019) Schoen, Alexander; Byerly, Andy; Hendrix, Brent; Bagwe, Rishikesh Mahesh; dos Santos, Euzeli C., Jr.; Ben Miled, Zina; Electrical and Computer Engineering, School of Engineering and TechnologyThis paper advocates a data summarization approach based on distance rather than the traditional time period when developing individualized machine learning models for fuel consumption. This approach is used in conjunction with seven predictors derived from vehicle speed and road grade to produce a highly predictive neural network model for average fuel consumption in heavy vehicles. The proposed model can easily be developed and deployed for each individual vehicle in a fleet in order to optimize fuel consumption over the entire fleet. The predictors of the model are aggregated over fixed window sizes of distance traveled. Different window sizes are evaluated and the results show that a 1 km window is able to predict fuel consumption with a 0.91 coefficient of determination and mean absolute peak-to-peak percent error less than 4% for routes that include both city and highway duty cycle segments.Item Multi-spectral Fusion for Semantic Segmentation Networks(2023-05) Edwards, Justin; El-Sharkawy, Mohamed; King, Brian; Kim, DongsooSemantic segmentation is a machine learning task that is seeing increased utilization in multiples fields, from medical imagery, to land demarcation, and autonomous vehicles. Semantic segmentation performs the pixel-wise classification of images, creating a new, seg- mented representation of the input that can be useful for detected various terrain and objects within and image. Recently, convolutional neural networks have been heavily utilized when creating neural networks tackling the semantic segmentation task. This is particularly true in the field of autonomous driving systems. The requirements of automated driver assistance systems (ADAS) drive semantic seg- mentation models targeted for deployment on ADAS to be lightweight while maintaining accuracy. A commonly used method to increase accuracy in the autonomous vehicle field is to fuse multiple sensory modalities. This research focuses on leveraging the fusion of long wave infrared (LWIR) imagery with visual spectrum imagery to fill in the inherent perfor- mance gaps when using visual imagery alone. This comes with a host of benefits, such as increase performance in various lighting conditions and adverse environmental conditions. Utilizing this fusion technique is an effective method of increasing the accuracy of a semantic segmentation model. Being a lightweight architecture is key for successful deployment on ADAS, as these systems often have resource constraints and need to operate in real-time. Multi-Spectral Fusion Network (MFNet) [1] accomplishes these parameters by leveraging a sensory fusion approach, and as such was selected as the baseline architecture for this research. Many improvements were made upon the baseline architecture by leveraging a variety of techniques. Such improvements include the proposal of a novel loss function categori- cal cross-entropy dice loss, introduction of squeeze and excitation (SE) blocks, addition of pyramid pooling, a new fusion technique, and drop input data augmentation. These improve- ments culminated in the creation of the Fast Thermal Fusion Network (FTFNet). Further improvements were made by introducing depthwise separable convolutional layers leading to lightweight FTFNet variants, FTFNet Lite 1 & 2. 13 The FTFNet family was trained on the Multi-Spectral Road Scenarios (MSRS) and MIL- Coaxials visual/LWIR datasets. The proposed modifications lead to an improvement over the baseline in mean intersection over union (mIoU) of 2.92% and 2.03% for FTFNet and FTFNet Lite 2 respectively when trained on the MSRS dataset. Additionally, when trained on the MIL-Coaxials dataset, the FTFNet family showed improvements in mIoU of 8.69%, 4.4%, and 5.0% for FTFNet, FTFNet Lite 1, and FTFNet Lite 2.