- Browse by Subject
Browsing by Subject "Object Detection"
Now showing 1 - 6 of 6
Results Per Page
Sort Options
Item A Multi-head Attention Approach with Complementary Multimodal Fusion for Vehicle Detection(2024-05) Tabassum, Nujhat; El-Sharkawy, Mohamed; King, Brian; Rizkalla, MaherThe advancement of autonomous vehicle technologies has taken a significant leap with the development of an improved version of the Multimodal Vehicle Detection Network (MVDNet), distinguished by the integration of a multi-head attention layer. This key enhancement significantly refines the network's capability to process and integrate multimodal sensor data, an aspect that becomes crucial in the face of challenging weather conditions. The effectiveness of this upgraded Multi-Head MVDNet is rigorously verified through an extensive dataset acquired from the Oxford Radar Robotcar, demonstrating its enhanced performance capabilities. Notably, in complex environmental conditions, the Multi-Head MVDNet shows a marked superiority in terms of Average Precision (AP) compared to existing models, underscoring its advanced detection capabilities. The transition from the traditional MVDNet to the enhanced Multi-Head Vehicle Detection Network signifies a notable breakthrough in the arena of vehicle detection technologies, with a special emphasis on operation under severe meteorological conditions, such as the obscuring presence of dense fog or the complexities introduced by heavy snowfall. This significant enhancement capitalizes on the foundational principles of the original MVDNet, which skillfully amalgamates the individual strengths of lidar and radar sensors. This is achieved through an intricate and refined process of feature tensor fusion, creating a more robust and comprehensive sensory data interpretation framework. A major innovation introduced in this updated model is the implementation of a multi-head attention layer. This layer serves as a sophisticated replacement for the previously employed self-attention mechanism. Segmenting the attention mechanism into several distinct partitions enhances the network's efficiency and accuracy in processing and interpreting vast arrays of sensor data. An exhaustive series of experimental analyses was undertaken to determine the optimal configuration of this multi-head attention mechanism. These experiments explored various combinations and settings, ultimately identifying a configuration consisting of seven distinct attention heads as the most effective. This setup was found to optimize the balance between computational efficiency and detection accuracy. When tested using the rich radar and lidar datasets from the ORR project, this advanced Multi-Head MVDNet configuration consistently demonstrated its superiority. It not only surpassed the performance of the original MVDNet but also showed marked improvements over models that relied solely on lidar data or the DEF models, especially in terms of vehicular detection accuracy. This enhancement in the MVDNet model, with its focus on multi-head attention, not only represents a significant leap in the field of autonomous vehicle detection but also lays a foundation for future research. It opens new pathways for exploring various attention mechanisms and their potential applicability in scenarios requiring real-time vehicle detection. Furthermore, it accentuates the importance of sophisticated sensor fusion techniques as vital tools in overcoming the challenges posed by adverse environmental conditions, thus paving the way for more resilient and reliable autonomous vehicular technologies.Item Enhanced 3D Object Detection and Tracking in Autonomous Vehicles: An Efficient Multi-Modal Deep Fusion Approach(2024-08) Kalgaonkar, Priyank B.; El-Sharkawy, Mohamed; King, Brian S.; Rizkalla, Maher E.; Abdallah, Mustafa A.This dissertation delves into a significant challenge for Autonomous Vehicles (AVs): achieving efficient and robust perception under adverse weather and lighting conditions. Systems that rely solely on cameras face difficulties with visibility over long distances, while radar-only systems struggle to recognize features like stop signs, which are crucial for safe navigation in such scenarios. To overcome this limitation, this research introduces a novel deep camera-radar fusion approach using neural networks. This method ensures reliable AV perception regardless of weather or lighting conditions. Cameras, similar to human vision, are adept at capturing rich semantic information, whereas radars can penetrate obstacles like fog and darkness, similar to X-ray vision. The thesis presents NeXtFusion, an innovative and efficient camera-radar fusion network designed specifically for robust AV perception. Building on the efficient single-sensor NeXtDet neural network, NeXtFusion significantly enhances object detection accuracy and tracking. A notable feature of NeXtFusion is its attention module, which refines critical feature representation for object detection, minimizing information loss when processing data from both cameras and radars. Extensive experiments conducted on large-scale datasets such as Argoverse, Microsoft COCO, and nuScenes thoroughly evaluate the capabilities of NeXtDet and NeXtFusion. The results show that NeXtFusion excels in detecting small and distant objects compared to existing methods. Notably, NeXtFusion achieves a state-of-the-art mAP score of 0.473 on the nuScenes validation set, outperforming competitors like OFT by 35.1% and MonoDIS by 9.5%. NeXtFusion's excellence extends beyond mAP scores. It also performs well in other crucial metrics, including mATE (0.449) and mAOE (0.534), highlighting its overall effectiveness in 3D object detection. Visualizations of real-world scenarios from the nuScenes dataset processed by NeXtFusion provide compelling evidence of its capability to handle diverse and challenging environments.Item Improving Object Detection using Enhanced EfficientNet Architecture(2023-08) Kamel Ibrahim, Michael; El-Sharkawy, Mohamed; King, Brian; Rizkalla, MaherEfficientNet is designed to achieve top accuracy while utilizing fewer parameters, in addition to less computational resources compared to previous models. In this paper, we are presenting compound scaling method that re-weight the network’s width (w), depth(d), and resolution (r), which leads to better performance than traditional methods that scale only one or two of these dimensions by adjusting the hyperparameters of the model. Additionally, we are presenting an enhanced EfficientNet Backbone architecture. We show that EfficientNet achieves top accuracy on the ImageNet dataset, while being up to 8.4x smaller and up to 6.1x faster than previous top performing models. The effec- tiveness demonstrated in EfficientNet on transfer learning and object detection tasks, where it achieves higher accuracy with fewer parameters and less computation. Henceforward, the proposed enhanced architecture will be discussed in detail and compared to the original architecture. Our approach provides a scalable and efficient solution for both academic research and practical applications, where resource constraints are often a limiting factor.Item Improving the Robustness of Object Detection Through a Multi-Camera–Based Fusion Algorithm Using Fuzzy Logic(Frontiers, 2021) Khan, Md Nazmuzzaman; Al Hasan, Mohammad; Anwar, Sohel; Mechanical and Energy Engineering, School of Engineering and TechnologyA single camera creates a bounding box (BB) for the detected object with certain accuracy through a convolutional neural network (CNN). However, a single RGB camera may not be able to capture the actual object within the BB even if the CNN detector accuracy is high for the object. In this research, we present a solution to this limitation through the usage of multiple cameras, projective transformation, and a fuzzy logic–based fusion. The proposed algorithm generates a “confidence score” for each frame to check the trustworthiness of the BB generated by the CNN detector. As a first step toward this solution, we created a two-camera setup to detect objects. Agricultural weed is used as objects to be detected. A CNN detector generates BB for each camera when weed is present. Then a projective transformation is used to project one camera’s image plane to another camera’s image plane. The intersect over union (IOU) overlap of the BB is computed when objects are detected correctly. Four different scenarios are generated based on how far the object is from the multi-camera setup, and IOU overlap is calculated for each scenario (ground truth). When objects are detected correctly and bounding boxes are at correct distance, the IOU overlap value should be close to the ground truth IOU overlap value. On the other hand, the IOU overlap value should differ if BBs are at incorrect positions. Mamdani fuzzy rules are generated using this reasoning, and three different confidence scores (“high,” “ok,” and “low”) are given to each frame based on accuracy and position of BBs. The proposed algorithm was then tested under different conditions to check its validity. The confidence score of the proposed fuzzy system for three different scenarios supports the hypothesis that the multi-camera–based fusion algorithm improved the overall robustness of the detection system.Item Object Detection Using Vision Transformed EfficientDet(2023-08) Kar, Shreyanil; El-Sharkawy, Mohamed A.; King, Brian S.; Rizkalla, Maher E.This research presents a novel approach for object detection by integrating Vision Transformers (ViT) into the EfficientDet architecture. The field of computer vision, encompassing artificial intelligence, focuses on the interpretation and analysis of visual data. Recent advancements in deep learning, particularly convolutional neural networks (CNNs), have significantly improved the accuracy and efficiency of computer vision systems. Object detection, a widely studied application within computer vision, involves the identification and localization of objects in images. The ViT backbone, renowned for its success in image classification and natural language processing tasks, employs self-attention mechanisms to capture global dependencies in input images. However, ViT’s capability to capture fine-grained details and context information is limited. To address this limitation, the integration of ViT into the EfficientDet architecture is proposed. EfficientDet is recognized for its efficiency and accuracy in object detection. By combining the strengths of ViT and EfficientDet, the proposed integration enhances the network’s ability to capture fine-grained details and context information. It leverages ViT’s global dependency modeling alongside EfficientDet’s efficient object detection framework, resulting in highly accurate and efficient performance. Noteworthy object detection frameworks utilized in the industry, such as RetinaNet, EfficientNet, and EfficientDet, primarily employ convolution. Experimental evaluations were conducted using the PASCAL VOC 2007 and 2012 datasets, widely acknowledged benchmarks for object detection. The integrated ViT-EfficientDet model achieved an impressive mean Average Precision (mAP) score of 86.27% when tested on the PASCAL VOC 2007 dataset, demonstrating its superior accuracy. These results underscore the potential of the proposed integration for real-world applications. In conclusion, the research introduces a novel integration of Vision Transformers into the EfficientDet architecture, yielding significant improvements in object detection performance. By combining ViT’s ability to capture global dependencies with EfficientDet’s efficiency and accuracy, the proposed approach offers enhanced object detection capabilities. Future research directions may explore additional datasets and evaluate the performance of the proposed framework across various computer vision tasks.Item Temporary Traffic Control Device Detection for Road Construction Projects Using Deep Learning Application(ASCE, 2022-03-07) Seo, Sungchul; Chen, Donghui; Kim, Kwangcheol; Kang, Kyubyung; Koo, Dan; Chae, Myungjin; Park, Hyung Keun; Mechanical and Energy Engineering, School of Engineering and TechnologyTraffic control devices in road construction zones play important roles, which (1) provide critical traffic-related information for the drivers, (2) prevent potential crashes near work zones, and (3) protect work crews’ safety. Due to the number of devices in each site, transportation agencies have faced challenges in timely and frequently inspecting traffic control devices, including temporary devices. Deep learning applications can support these inspection processes. The first step of the inspection using deep learning is recognizing traffic control devices in the work zone. This study collected road images using vehicle-mounted cameras from various illuminance and weather conditions. Then, the study (1) labeled eight classes of temporary traffic control devices (TTCDs), (2) modified and trained a machine-learning model using the YOLOv3 algorithm, and (3) tested the detection outcomes of various TTCDs. The key finding shows that the proposed model recognized more than 98% of the temporary traffic signs correctly and approximately 81% of temporary traffic control devices correctly. The construction barricade had the lowest mean Average Precision (50%) out of eight classes. The outcomes can be used as the first step of autonomous safety inspections for road construction projects.