IU Indianapolis ScholarWorks :: Browsing by Subject "YOLO"

Browsing by Subject "YOLO"

Now showing 1 - 3 of 3

BIoU: An Improved Bounding Box Regression for Object Detection
(MDPI, 2022-09-28) Ravi, Niranjan; Naqvi, Sami; El-Sharkawy, Mohamed; Electrical and Computer Engineering, School of Engineering and Technology
Object detection is a predominant challenge in computer vision and image processing to detect instances of objects of various classes within an image or video. Recently, a new domain of vehicular platforms, e-scooters, has been widely used across domestic and urban environments. The driving behavior of e-scooter users significantly differs from other vehicles on the road, and their interactions with pedestrians are also increasing. To ensure pedestrian safety and develop an efficient traffic monitoring system, a reliable object detection system for e-scooters is required. However, existing object detectors based on IoU loss functions suffer various drawbacks when dealing with densely packed objects or inaccurate predictions. To address this problem, a new loss function, balanced-IoU (BIoU), is proposed in this article. This loss function considers the parameterized distance between the centers and the minimum and maximum edges of the bounding boxes to address the localization problem. With the help of synthetic data, a simulation experiment was carried out to analyze the bounding box regression of various losses. Extensive experiments have been carried out on a two-stage object detector, MASK_RCNN, and single-stage object detectors such as YOLOv5n6, YOLOv5x on Microsoft Common Objects in Context, SKU110k, and our custom e-scooter dataset. The proposed loss function demonstrated an increment of 3.70% at 𝐴𝑃𝑆 on the COCO dataset, 6.20% at AP55 on SKU110k, and 9.03% at AP80 of the custom e-scooter dataset.
Enhancing Precision of Object Detectors: Bridging Classification and Localization Gaps for 2D and 3D Models
(2024-05) Ravi, Niranjan; El-Sharkawy, Mohamed; Rizkalla, Maher E.; Li, Lingxi; King, Brian S.
Artificial Intelligence (AI) has revolutionized and accelerated significant advancements in various fields such as healthcare, finance, education, agriculture and the development of autonomous vehicles. We are rapidly approaching Level 5 Autonomy due to recent developments in autonomous technology, including self-driving cars, robot navigation, smart traffic monitoring systems, and dynamic routing. This success has been made possible due to Deep Learning technologies and advanced Computer Vision (CV) algorithms. With the help of perception sensors such as Camera, LiDAR and RADAR, CV algorithms enable a self-driving vehicle to interact with the environment and make intelligent decisions. Object detection lays the foundations for various applications, such as collision and obstacle avoidance, lane detection, pedestrian and vehicular safety, and object tracking. Object detection has two significant components: image classification and object localization. In recent years, enhancing the performance of 2D and 3D object detectors has spiked interest in the research community. This research aims to resolve the drawbacks associated with localization loss estimation of 2D and 3D object detectors by addressing the bounding box regression problem, addressing the class imbalance issue affecting the confidence loss estimation, and finally proposing a dynamic cross-model 3D hybrid object detector with enhanced localization and confidence loss estimation. This research aims to address challenges in object detectors through four key contributions. In the first part, we aim to address the problems associated with the image classification component of 2D object detectors. Class imbalance is a common problem associated with supervised training. Common causes are noisy data, a scene with a tiny object surrounded by background pixels, or a dense scene with too many objects. These scenarios can produce many negative samples compared to positive ones, affecting the network learning and reducing the overall performance. We examined these drawbacks and proposed an Enhanced Hard Negative Mining (EHNM) approach, which utilizes anchor boxes with 20% to 50% overlap and positive and negative samples to boost performance. The efficiency of the proposed EHNM was evaluated using Single Shot Multibox Detector (SSD) architecture on the PASCAL VOC dataset, indicating that the detection accuracy of tiny objects increased by 3.9% and 4% and the overall accuracy improved by 0.9%. To address localization loss, our second approach investigates drawbacks associated with existing bounding box regression problems, such as poor convergence and incorrect regression. We analyzed various cases, such as when objects are inclusive of one another, two objects with the same centres, two objects with the same centres and similar aspect ratios. During our analysis, we observed existing intersections over Union (IoU) loss and its variant’s failure to address them. We proposed two new loss functions, Improved Intersection Over Union (IIoU) and Balanced Intersection Over Union (BIoU), to enhance performance and minimize computational efforts. Two variants of the YOLOv5 model, YOLOv5n6 and YOLOv5s, were utilized to demonstrate the superior performance of IIoU on PASCAL VOC and CGMU datasets. With help of ROS and NVIDIA’s devices, inference speed was observed in real-time. Extensive experiments were performed to evaluate the performance of BIoU on object detectors. The evaluation results indicated MASK_RCNN network trained on the COCO dataset, YOLOv5n6 network trained on SKU-110K and YOLOv5x trained on the custom e-scooter dataset demonstrated 3.70% increase on small objects, 6.20% on 55% overlap and 9.03% on 80% overlap. In the earlier parts, we primarily focused on 2D object detectors. Owing to its success, we extended the scope of our research to 3D object detectors in the later parts. The third portion of our research aims to solve bounding box problems associated with 3D rotated objects. Existing axis-aligned loss functions suffer a performance gap if the objects are rotated. We enhanced the earlier proposed IIoU loss by considering two additional parameters: the objects’ Z-axis and rotation angle. These two parameters aid in localizing the object in 3D space. Evaluation was performed on LiDAR and Fusion methods on 3D KITTI and nuScenes datasets. Once we addressed the drawbacks associated with confidence and localization loss, we further explored ways to increase the performance of cross-model 3D object detectors. We discovered from previous studies that perception sensors are volatile to harsh environmental conditions, sunlight, and blurry motion. In the final portion of our research, we propose a hybrid 3D cross-model detection network (MAEGNN) equipped with MaskedAuto Encoders (MAE) and Graph Neural Networks (GNN) along with earlier proposed IIoU and ENHM. The performance evaluation on MAEGNN on the KITTI validation dataset and KITTI test set yielded a detection accuracy of 69.15%, 63.99%, 58.46% and 40.85%, 37.37% on 3D pedestrians with overlap of 50%. This developed hybrid detector overcomes the challenges of localization error and confidence estimation and outperforms many state-of-art 3D object detectors for autonomous platforms.
Temporary Traffic Control Device Detection for Road Construction Projects Using Deep Learning Application
(ASCE, 2022-03-07) Seo, Sungchul; Chen, Donghui; Kim, Kwangcheol; Kang, Kyubyung; Koo, Dan; Chae, Myungjin; Park, Hyung Keun; Mechanical and Energy Engineering, School of Engineering and Technology
Traffic control devices in road construction zones play important roles, which (1) provide critical traffic-related information for the drivers, (2) prevent potential crashes near work zones, and (3) protect work crews’ safety. Due to the number of devices in each site, transportation agencies have faced challenges in timely and frequently inspecting traffic control devices, including temporary devices. Deep learning applications can support these inspection processes. The first step of the inspection using deep learning is recognizing traffic control devices in the work zone. This study collected road images using vehicle-mounted cameras from various illuminance and weather conditions. Then, the study (1) labeled eight classes of temporary traffic control devices (TTCDs), (2) modified and trained a machine-learning model using the YOLOv3 algorithm, and (3) tested the detection outcomes of various TTCDs. The key finding shows that the proposed model recognized more than 98% of the temporary traffic signs correctly and approximately 81% of temporary traffic control devices correctly. The construction barricade had the lowest mean Average Precision (50%) out of eight classes. The outcomes can be used as the first step of autonomous safety inspections for road construction projects.

Browsing by Subject "YOLO"

Results Per Page

Sort Options