- Browse by Subject
Browsing by Subject "CNN"
Now showing 1 - 10 of 12
Results Per Page
Sort Options
Item 3-Level Residual Capsule Network for Complex Datasets(IEEE, 2020-02) Bhamidi, Sree Bala Shruthi; El-Sharkawy, Mohamed; Electrical and Computer Engineering, School of Engineering and TechnologyThe Convolutional Neural Network (CNN) have shown a substantial improvement in the field of Machine Learning. But they do come with their own set of drawbacks. Capsule Networks have addressed the limitations of CNNs and have shown a great improvement by calculating the pose and transformation of the image. Deeper networks are more powerful than shallow networks but at the same time, more difficult to train. Residual Networks ease the training and have shown evidence that they can give good accuracy with considerable depth. Residual Capsule Network [15] has put the Residual Network and Capsule Network together. Though it did well on simple dataset such as MNIST, the architecture can be improved to do better on complex datasets like CIFAR-10. This brings us to the idea of 3-Level Residual Capsule which not only decreases the number of parameters when compared to the seven-ensemble model, but also performs better on complex datasets when compared to Residual Capsule Network.Item A Multi-head Attention Approach with Complementary Multimodal Fusion for Vehicle Detection(2024-05) Tabassum, Nujhat; El-Sharkawy, Mohamed; King, Brian; Rizkalla, MaherThe advancement of autonomous vehicle technologies has taken a significant leap with the development of an improved version of the Multimodal Vehicle Detection Network (MVDNet), distinguished by the integration of a multi-head attention layer. This key enhancement significantly refines the network's capability to process and integrate multimodal sensor data, an aspect that becomes crucial in the face of challenging weather conditions. The effectiveness of this upgraded Multi-Head MVDNet is rigorously verified through an extensive dataset acquired from the Oxford Radar Robotcar, demonstrating its enhanced performance capabilities. Notably, in complex environmental conditions, the Multi-Head MVDNet shows a marked superiority in terms of Average Precision (AP) compared to existing models, underscoring its advanced detection capabilities. The transition from the traditional MVDNet to the enhanced Multi-Head Vehicle Detection Network signifies a notable breakthrough in the arena of vehicle detection technologies, with a special emphasis on operation under severe meteorological conditions, such as the obscuring presence of dense fog or the complexities introduced by heavy snowfall. This significant enhancement capitalizes on the foundational principles of the original MVDNet, which skillfully amalgamates the individual strengths of lidar and radar sensors. This is achieved through an intricate and refined process of feature tensor fusion, creating a more robust and comprehensive sensory data interpretation framework. A major innovation introduced in this updated model is the implementation of a multi-head attention layer. This layer serves as a sophisticated replacement for the previously employed self-attention mechanism. Segmenting the attention mechanism into several distinct partitions enhances the network's efficiency and accuracy in processing and interpreting vast arrays of sensor data. An exhaustive series of experimental analyses was undertaken to determine the optimal configuration of this multi-head attention mechanism. These experiments explored various combinations and settings, ultimately identifying a configuration consisting of seven distinct attention heads as the most effective. This setup was found to optimize the balance between computational efficiency and detection accuracy. When tested using the rich radar and lidar datasets from the ORR project, this advanced Multi-Head MVDNet configuration consistently demonstrated its superiority. It not only surpassed the performance of the original MVDNet but also showed marked improvements over models that relied solely on lidar data or the DEF models, especially in terms of vehicular detection accuracy. This enhancement in the MVDNet model, with its focus on multi-head attention, not only represents a significant leap in the field of autonomous vehicle detection but also lays a foundation for future research. It opens new pathways for exploring various attention mechanisms and their potential applicability in scenarios requiring real-time vehicle detection. Furthermore, it accentuates the importance of sophisticated sensor fusion techniques as vital tools in overcoming the challenges posed by adverse environmental conditions, thus paving the way for more resilient and reliable autonomous vehicular technologies.Item AI on the Edge with CondenseNeXt: An Efficient Deep Neural Network for Devices with Constrained Computational Resources(2021-08) Kalgaonkar, Priyank B.; El-Sharkawy, Mohamed A.; King, Brian S.; Rizkalla, Maher E.Research work presented within this thesis propose a neoteric variant of deep convolutional neural network architecture, CondenseNeXt, designed specifically for ARM-based embedded computing platforms with constrained computational resources. CondenseNeXt is an improved version of CondenseNet, the baseline architecture whose roots can be traced back to ResNet. CondeseNeXt replaces group convolutions in CondenseNet with depthwise separable convolutions and introduces group-wise pruning, a model compression technique, to prune (remove) redundant and insignificant elements that either are irrelevant or do not affect performance of the network upon disposition. Cardinality, a new dimension to the existing spatial dimensions, and class-balanced focal loss function, a weighting factor inversely proportional to the number of samples, has been incorporated in order to relieve the harsh effects of pruning, into the design of CondenseNeXt’s algorithm. Furthermore, extensive analyses of this novel CNN architecture was performed on three benchmarking image datasets: CIFAR-10, CIFAR-100 and ImageNet by deploying the trained weight on to an ARM-based embedded computing platform: NXP BlueBox 2.0, for real-time image classification. The outputs are observed in real-time in RTMaps Remote Studio’s console to verify the correctness of classes being predicted. CondenseNeXt achieves state-of-the-art image classification performance on three benchmark datasets including CIFAR-10 (4.79% top-1 error), CIFAR-100 (21.98% top-1 error) and ImageNet (7.91% single model, single crop top-5 error), and up to 59.98% reduction in forward FLOPs compared to CondenseNet. CondenseNeXt can also achieve a final trained model size of 2.9 MB, however at the cost of 2.26% in accuracy loss. Thus, performing image classification on ARM-Based computing platforms without requiring a CUDA enabled GPU support, with outstanding efficiency.Item BIoU: An Improved Bounding Box Regression for Object Detection(MDPI, 2022-09-28) Ravi, Niranjan; Naqvi, Sami; El-Sharkawy, Mohamed; Electrical and Computer Engineering, School of Engineering and TechnologyObject detection is a predominant challenge in computer vision and image processing to detect instances of objects of various classes within an image or video. Recently, a new domain of vehicular platforms, e-scooters, has been widely used across domestic and urban environments. The driving behavior of e-scooter users significantly differs from other vehicles on the road, and their interactions with pedestrians are also increasing. To ensure pedestrian safety and develop an efficient traffic monitoring system, a reliable object detection system for e-scooters is required. However, existing object detectors based on IoU loss functions suffer various drawbacks when dealing with densely packed objects or inaccurate predictions. To address this problem, a new loss function, balanced-IoU (BIoU), is proposed in this article. This loss function considers the parameterized distance between the centers and the minimum and maximum edges of the bounding boxes to address the localization problem. With the help of synthetic data, a simulation experiment was carried out to analyze the bounding box regression of various losses. Extensive experiments have been carried out on a two-stage object detector, MASK_RCNN, and single-stage object detectors such as YOLOv5n6, YOLOv5x on Microsoft Common Objects in Context, SKU110k, and our custom e-scooter dataset. The proposed loss function demonstrated an increment of 3.70% at 𝐴𝑃𝑆 on the COCO dataset, 6.20% at AP55 on SKU110k, and 9.03% at AP80 of the custom e-scooter dataset.Item Deep Multimodal Physiological Learning of Cerebral Vasoregulation Dynamics on Stroke Patients Towards Precision Brain Medicine(2024-08) Tipparti, Akanksha; Zhang, Qingxue; King, Brain; Yung-Ping Chien, StanleyImpaired cerebral vasoregulation is one of the most common post-ischemic stroke effects. Diagnosis and prevention of this condition is often invasive, costly and in-effective. This impairment restricts the cerebral blood vessels to properly regulate blood flow, which is very important for normal brain functioning. Developing accurate, non-invasive and efficient methods to detect this condition aids in better stroke diagnosis and prevention. The aim of this thesis is to develop deep learning techniques for the purpose of detection of cerebral vasoregulation impairments by analyzing physiological signals. This research employs various Deep learning techniques like Convolution Neural Networks (CNN), Mo bileNet, and Long-Short-Term Memory (LSTM) to determine variety of physiological signals from the PhysioNet database like Electrocardio-gram (ECG), Transcranial Doppler (TCD), Electromyogram (EMG), and Blood Pressure(BP) as stroke or non-stroke subjects. The effectiveness of these algorithms is demonstrated by a classification accuracy of 90% for the combination of ECG and EMG signals. Furthermore, this research explores the importance of analyzing dynamic physiologi cal activities in determining the impairment. The dynamic activities include Sit-stand, Sit-stand-balance, Head-up-tilt, and Walk dataset from the PhysioNet website. CNN and MobileNetV3 are employed in classification purposes of these signals, attempting to iden tify cerebral health. The accuracy of the model and robustness of these methods is greatly enhanced when multiple signals are integrated. Overall, this study highlights the potential of deep multimodal physiological learning in the development of precision brain medicine further enhancing stroke diagnosis. The results pave the way for the development of advanced diagnostic tools to determine cerebral health.Item Efficient Intelligence Towards Real-Time Precision Medicine With Systematic Pruning and Quantization(2024-08) Karunakaran, Maneesh; Zhang, Qingxue; King, Brian; Maher, Rizkalla E.The widespread adoption of Convolutional Neural Networks (CNNs) in real-world applications, particularly on resource-constrained devices, is hindered by their computational complexity and memory requirements. This research investigates the application of pruning and quantization techniques to optimize CNNs for arrhythmia classification using the MIT-BIH Arrhythmia Database. By combining magnitude-based pruning, regularization-based pruning, filter map-based pruning, and quantization at different bit-widths (4-bit, 8-bit, 2-bit, and 1-bit), the study aims to develop a more compact and efficient CNN model while maintaining high accuracy. The experimental results demonstrate that these techniques effectively reduce model size, improve inference speed, and maintain accuracy, adapting them for use on devices with limited resources. The findings highlight the potential of these optimization techniques for real-time applications in mobile health monitoring and edge computing, paving the way for broader adoption of deep learning in resource-limited environments.Item Enhanced 3D Object Detection and Tracking in Autonomous Vehicles: An Efficient Multi-Modal Deep Fusion Approach(2024-08) Kalgaonkar, Priyank B.; El-Sharkawy, Mohamed; King, Brian S.; Rizkalla, Maher E.; Abdallah, Mustafa A.This dissertation delves into a significant challenge for Autonomous Vehicles (AVs): achieving efficient and robust perception under adverse weather and lighting conditions. Systems that rely solely on cameras face difficulties with visibility over long distances, while radar-only systems struggle to recognize features like stop signs, which are crucial for safe navigation in such scenarios. To overcome this limitation, this research introduces a novel deep camera-radar fusion approach using neural networks. This method ensures reliable AV perception regardless of weather or lighting conditions. Cameras, similar to human vision, are adept at capturing rich semantic information, whereas radars can penetrate obstacles like fog and darkness, similar to X-ray vision. The thesis presents NeXtFusion, an innovative and efficient camera-radar fusion network designed specifically for robust AV perception. Building on the efficient single-sensor NeXtDet neural network, NeXtFusion significantly enhances object detection accuracy and tracking. A notable feature of NeXtFusion is its attention module, which refines critical feature representation for object detection, minimizing information loss when processing data from both cameras and radars. Extensive experiments conducted on large-scale datasets such as Argoverse, Microsoft COCO, and nuScenes thoroughly evaluate the capabilities of NeXtDet and NeXtFusion. The results show that NeXtFusion excels in detecting small and distant objects compared to existing methods. Notably, NeXtFusion achieves a state-of-the-art mAP score of 0.473 on the nuScenes validation set, outperforming competitors like OFT by 35.1% and MonoDIS by 9.5%. NeXtFusion's excellence extends beyond mAP scores. It also performs well in other crucial metrics, including mATE (0.449) and mAOE (0.534), highlighting its overall effectiveness in 3D object detection. Visualizations of real-world scenarios from the nuScenes dataset processed by NeXtFusion provide compelling evidence of its capability to handle diverse and challenging environments.Item HBONext: An Efficient Dnn for Light Edge Embedded Devices(2021-05) Joshi, Sanket Ramesh; El-Sharkawy, Mohamed; King, Brian; Rizkalla, MaherEvery year the most effective Deep learning models, CNN architectures are showcased based on their compatibility and performance on the embedded edge hardware, especially for applications like image classification. These deep learning models necessitate a significant amount of computation and memory, so they can only be used on high-performance computing systems like CPUs or GPUs. However, they often struggle to fulfill portable specifications due to resource, energy, and real-time constraints. Hardware accelerators have recently been designed to provide the computational resources that AI and machine learning tools need. These edge accelerators have high-performance hardware which helps maintain the precision needed to accomplish this mission. Furthermore, this classification dilemma that investigates channel interdependencies using either depth-wise or group-wise convolutional features, has benefited from the inclusion of Bottleneck modules. Because of its increasing use in portable applications, the classic inverted residual block, a well-known architecture technique, has gotten more recognition. This work takes it a step forward by introducing a design method for porting CNNs to lowresource embedded systems, essentially bridging the difference between deep learning models and embedded edge systems. To achieve these goals, we use closer computing strategies to reduce the computer’s computational load and memory usage while retaining excellent deployment efficiency. This thesis work introduces HBONext, a mutated version of Harmonious Bottlenecks (DHbneck) combined with a Flipped version of Inverted Residual (FIR), which outperforms the current HBONet architecture in terms of accuracy and model size miniaturization. Unlike the current definition of inverted residual, this FIR block performs identity mapping and spatial transformation at its higher dimensions. The HBO solution, on the other hand, focuses on two orthogonal dimensions: spatial (H/W) contraction-expansion and later channel (C) expansion-contraction, which are both organized in a bilaterally symmetric manner. HBONext is one of those versions that was designed specifically for embedded and mobile applications. In this research work, we also show how to use NXP Bluebox 2.0 to build a real-time HBONext image classifier. The integration of the model into this hardware has been a big hit owing to the limited model size of 3 MB. The model was trained and validated using CIFAR10 dataset, which performed exceptionally well due to its smaller size and higher accuracy. The validation accuracy of the baseline HBONet architecture is 80.97%, and the model is 22 MB in size. The proposed architecture HBONext variants, on the other hand, gave a higher validation accuracy of 89.70% and a model size of 3.00 MB measured using the number of parameters. The performance metrics of HBONext architecture and its various variants are compared in the following chapters.Item Multi-spectral Fusion for Semantic Segmentation Networks(2023-05) Edwards, Justin; El-Sharkawy, Mohamed; King, Brian; Kim, DongsooSemantic segmentation is a machine learning task that is seeing increased utilization in multiples fields, from medical imagery, to land demarcation, and autonomous vehicles. Semantic segmentation performs the pixel-wise classification of images, creating a new, seg- mented representation of the input that can be useful for detected various terrain and objects within and image. Recently, convolutional neural networks have been heavily utilized when creating neural networks tackling the semantic segmentation task. This is particularly true in the field of autonomous driving systems. The requirements of automated driver assistance systems (ADAS) drive semantic seg- mentation models targeted for deployment on ADAS to be lightweight while maintaining accuracy. A commonly used method to increase accuracy in the autonomous vehicle field is to fuse multiple sensory modalities. This research focuses on leveraging the fusion of long wave infrared (LWIR) imagery with visual spectrum imagery to fill in the inherent perfor- mance gaps when using visual imagery alone. This comes with a host of benefits, such as increase performance in various lighting conditions and adverse environmental conditions. Utilizing this fusion technique is an effective method of increasing the accuracy of a semantic segmentation model. Being a lightweight architecture is key for successful deployment on ADAS, as these systems often have resource constraints and need to operate in real-time. Multi-Spectral Fusion Network (MFNet) [1] accomplishes these parameters by leveraging a sensory fusion approach, and as such was selected as the baseline architecture for this research. Many improvements were made upon the baseline architecture by leveraging a variety of techniques. Such improvements include the proposal of a novel loss function categori- cal cross-entropy dice loss, introduction of squeeze and excitation (SE) blocks, addition of pyramid pooling, a new fusion technique, and drop input data augmentation. These improve- ments culminated in the creation of the Fast Thermal Fusion Network (FTFNet). Further improvements were made by introducing depthwise separable convolutional layers leading to lightweight FTFNet variants, FTFNet Lite 1 & 2. 13 The FTFNet family was trained on the Multi-Spectral Road Scenarios (MSRS) and MIL- Coaxials visual/LWIR datasets. The proposed modifications lead to an improvement over the baseline in mean intersection over union (mIoU) of 2.92% and 2.03% for FTFNet and FTFNet Lite 2 respectively when trained on the MSRS dataset. Additionally, when trained on the MIL-Coaxials dataset, the FTFNet family showed improvements in mIoU of 8.69%, 4.4%, and 5.0% for FTFNet, FTFNet Lite 1, and FTFNet Lite 2.Item Pruning Convolution Neural Network (SqueezeNet) for Efficient Hardware Deployment(2018-12) Gaikwad, Akash S.; El-Sharkawy, Mohamed; Rizkalla, Maher; King, BrianIn recent years, deep learning models have become popular in the real-time embedded application, but there are many complexities for hardware deployment because of limited resources such as memory, computational power, and energy. Recent research in the field of deep learning focuses on reducing the model size of the Convolution Neural Network (CNN) by various compression techniques like Architectural compression, Pruning, Quantization, and Encoding (e.g., Huffman encoding). Network pruning is one of the promising technique to solve these problems. This thesis proposes methods to prune the convolution neural network (SqueezeNet) without introducing network sparsity in the pruned model. This thesis proposes three methods to prune the CNN to decrease the model size of CNN without a significant drop in the accuracy of the model. 1: Pruning based on Taylor expansion of change in cost function Delta C. 2: Pruning based on L2 normalization of activation maps. 3: Pruning based on a combination of method 1 and method 2. The proposed methods use various ranking methods to rank the convolution kernels and prune the lower ranked filters afterwards SqueezeNet model is fine-tuned by backpropagation. Transfer learning technique is used to train the SqueezeNet on the CIFAR-10 dataset. Results show that the proposed approach reduces the SqueezeNet model by 72% without a significant drop in the accuracy of the model (optimal pruning efficiency result). Results also show that Pruning based on a combination of Taylor expansion of the cost function and L2 normalization of activation maps achieves better pruning efficiency compared to other individual pruning criteria and most of the pruned kernels are from mid and high-level layers. The Pruned model is deployed on BlueBox 2.0 using RTMaps software and model performance was evaluated.