- Browse by Author
Browsing by Author "El-Sharkawy, Mohamed"
Now showing 1 - 10 of 68
Results Per Page
Sort Options
Item 3-Level Residual Capsule Network for Complex Datasets(IEEE, 2020-02) Bhamidi, Sree Bala Shruthi; El-Sharkawy, Mohamed; Electrical and Computer Engineering, School of Engineering and TechnologyThe Convolutional Neural Network (CNN) have shown a substantial improvement in the field of Machine Learning. But they do come with their own set of drawbacks. Capsule Networks have addressed the limitations of CNNs and have shown a great improvement by calculating the pose and transformation of the image. Deeper networks are more powerful than shallow networks but at the same time, more difficult to train. Residual Networks ease the training and have shown evidence that they can give good accuracy with considerable depth. Residual Capsule Network [15] has put the Residual Network and Capsule Network together. Though it did well on simple dataset such as MNIST, the architecture can be improved to do better on complex datasets like CIFAR-10. This brings us to the idea of 3-Level Residual Capsule which not only decreases the number of parameters when compared to the seven-ensemble model, but also performs better on complex datasets when compared to Residual Capsule Network.Item A Multi-head Attention Approach with Complementary Multimodal Fusion for Vehicle Detection(2024-05) Tabassum, Nujhat; El-Sharkawy, Mohamed; King, Brian; Rizkalla, MaherThe advancement of autonomous vehicle technologies has taken a significant leap with the development of an improved version of the Multimodal Vehicle Detection Network (MVDNet), distinguished by the integration of a multi-head attention layer. This key enhancement significantly refines the network's capability to process and integrate multimodal sensor data, an aspect that becomes crucial in the face of challenging weather conditions. The effectiveness of this upgraded Multi-Head MVDNet is rigorously verified through an extensive dataset acquired from the Oxford Radar Robotcar, demonstrating its enhanced performance capabilities. Notably, in complex environmental conditions, the Multi-Head MVDNet shows a marked superiority in terms of Average Precision (AP) compared to existing models, underscoring its advanced detection capabilities. The transition from the traditional MVDNet to the enhanced Multi-Head Vehicle Detection Network signifies a notable breakthrough in the arena of vehicle detection technologies, with a special emphasis on operation under severe meteorological conditions, such as the obscuring presence of dense fog or the complexities introduced by heavy snowfall. This significant enhancement capitalizes on the foundational principles of the original MVDNet, which skillfully amalgamates the individual strengths of lidar and radar sensors. This is achieved through an intricate and refined process of feature tensor fusion, creating a more robust and comprehensive sensory data interpretation framework. A major innovation introduced in this updated model is the implementation of a multi-head attention layer. This layer serves as a sophisticated replacement for the previously employed self-attention mechanism. Segmenting the attention mechanism into several distinct partitions enhances the network's efficiency and accuracy in processing and interpreting vast arrays of sensor data. An exhaustive series of experimental analyses was undertaken to determine the optimal configuration of this multi-head attention mechanism. These experiments explored various combinations and settings, ultimately identifying a configuration consisting of seven distinct attention heads as the most effective. This setup was found to optimize the balance between computational efficiency and detection accuracy. When tested using the rich radar and lidar datasets from the ORR project, this advanced Multi-Head MVDNet configuration consistently demonstrated its superiority. It not only surpassed the performance of the original MVDNet but also showed marked improvements over models that relied solely on lidar data or the DEF models, especially in terms of vehicular detection accuracy. This enhancement in the MVDNet model, with its focus on multi-head attention, not only represents a significant leap in the field of autonomous vehicle detection but also lays a foundation for future research. It opens new pathways for exploring various attention mechanisms and their potential applicability in scenarios requiring real-time vehicle detection. Furthermore, it accentuates the importance of sophisticated sensor fusion techniques as vital tools in overcoming the challenges posed by adverse environmental conditions, thus paving the way for more resilient and reliable autonomous vehicular technologies.Item A-MnasNet and Image Classification on NXP Bluebox 2.0(ASTES, 2021-01) Shah, Prasham; El-Sharkawy, Mohamed; Electrical and Computer Engineering, School of Engineering and TechnologyComputer Vision is a domain which deals with the challenge of enabling technology with vision capabilities. This goal is accomplished with the use of Convolutional Neural Networks. They are the backbone of implementing vision applications on embedded systems. They are complex but highly efficient in extracting features, thus, enabling embedded systems to perform computer vision applications. After AlexNet won the ImageNet Large Scale Visual Recognition Challenge in 2012, there was a drastic increase in research on Convolutional Neural Networks. The convolutional neural networks were made deeper and wider, in order to make them more efficient. They were able to extract features efficiently, but the computational complexity and the computational cost of those networks also increased. It became very challenging to deploy such networks on embedded hardware. Since embedded systems have limited resources like power, speed and computational capabilities, researchers got more inclined towards the goal of making convolutional neural networks more compact, with efficiency of extracting features similar to that of the novel architectures. This research has a similar goal of proposing a convolutional neural network with enhanced efficiency and further using it for a vision application like Image Classification on NXP Bluebox 2.0, an autonomous driving platform by NXP Semiconductors. This paper gives an insight on the Design Space Exploration technique used to propose A-MnasNet (Augmented MnasNet) architecture, with enhanced capabilities, from MnasNet architecture. Furthermore, it explains the implementation of A-MnasNet on Bluebox 2.0 for Image Classification.Item A-MnasNet: Augmented MnasNet for Computer Vision(IEEE, 2020-08) Shah, Prasham; El-Sharkawy, Mohamed; Electrical and Computer Engineering, School of Engineering and TechnologyConvolutional Neural Networks (CNNs) play an essential role in Deep Learning. They are extensively used in Computer Vision. They are complicated but very effective in extracting features from an image or a video stream. After AlexNet [5] won the ILSVRC [8] in 2012, there was a drastic increase in research related with CNNs. Many state-of-the-art architectures like VGG Net [12], GoogleNet [13], ResNet [18], Inception-v4 [14], Inception-Resnet-v2 [14], ShuffleNet [23], Xception [24], MobileNet [6], MobileNetV2 [7], SqueezeNet [16], SqueezeNext [17] and many more were introduced. The trend behind the research depicts an increase in the number of layers of CNN to make them more efficient but with that the size of the model increased as well. This problem was fixed with the advent of new algorithms which resulted in a decrease in model size. As a result, today we have CNN models which are implemented on mobile devices. These mobile models are small and fast which in turn reduce the computational cost of the embedded system. This paper resembles similar idea, it proposes a new model Augmented MnasNet (A-MnasNet) which has been derived from MnasNet [1]. The model is trained with CIFAR-10 [4] dataset and has a validation accuracy of 96.89% and a model size of 11.6 MB. It outperforms its baseline architecture MnasNet which has a validation accuracy of 80.8% and a model size of 12.7 MB when trained with CIFAR-10.Item Autonomous Embedded System Enabled 3-D Object Detector: (with Point Cloud and Camera)(IEEE, 2019-09) Katare, Dewant; El-Sharkawy, Mohamed; Electrical and Computer Engineering, School of Engineering and TechnologyAn Autonomous vehicle or present day smart vehicle is equipped with several ADAS safety features such as Blind Spot Detection, Forward Collision Warning, Lane Departure and Parking Assistance, Surround View System, Vehicular communication System. Recent research utilize deep learning algorithms as a counterfeit for these traditional methods, using optimal sensors. This paper discusses the perception tasks related to autonomous vehicle, specifically the computer-vision approach of 3D object detection and thus proposes a model compatible with embedded system using the RTMaps framework. The proposed model is based on the sensors: camera and Lidar connected to an autonomous embedded system, providing the sensed inputs to the deep learning classifier which on the basis of theses inputs estimates the position and predicts a 3-d bounding box on the physical objects. The Frustum PointNet a contemporary architecture for 3-D object detection is used as base model and is implemented with extended functionality. The architecture is trained and tested on the KITTI dataset and is discussed with the competitive validation precision and accuracy. The Presented model is deployed on the Bluebox 2.0 platform with the RTMaps Embedded framework.Item Bilateral and adaptive loop filter implementations in 3D-high efficiency video coding standard(2015-09) Amiri, Delaram; El-Sharkawy, Mohamed; King, Brian; Salama, Paul; Rizkalla, MaherIn this thesis, we describe a different implementation for in loop filtering method for 3D-HEVC. First we propose the use of adaptive loop filtering (ALF) technique for 3D-HEVC standard in-loop filtering. This filter uses Wiener–based method to minimize the Mean Squared Error between filtered pixel and original pixels. The performance of adaptive loop filter in picture based level is evaluated. Results show up to of 0.2 dB PSNR improvement in Luminance component for the texture and 2.1 dB for the depth. In addition, we obtain up to 0.1 dB improvement in Chrominance component for the texture view after applying this filter in picture based filtering. Moreover, a design of an in-loop filtering with Fast Bilateral Filter for 3D-HEVC standard is proposed. Bilateral filter is a filter that smoothes an image while preserving strong edges and it can remove the artifacts in an image. Performance of the bilateral filter in picture based level for 3D-HEVC is evaluated. Test model HTM- 6.2 is used to demonstrate the results. Results show up to of 20 percent of reduction in processing time of 3D-HEVC with less than affecting PSNR of the encoded 3D video using Fast Bilateral Filter.Item Biomedical concept association and clustering using word embeddings(2018-12) Shah, Setu; Luo, Xiao; El-Sharkawy, Mohamed; King, BrianBiomedical data exists in the form of journal articles, research studies, electronic health records, care guidelines, etc. While text mining and natural language processing tools have been widely employed across various domains, these are just taking off in the healthcare space. A primary hurdle that makes it difficult to build artificial intelligence models that use biomedical data, is the limited amount of labelled data available. Since most models rely on supervised or semi-supervised methods, generating large amounts of pre-processed labelled data that can be used for training purposes becomes extremely costly. Even for datasets that are labelled, the lack of normalization of biomedical concepts further affects the quality of results produced and limits the application to a restricted dataset. This affects reproducibility of the results and techniques across datasets, making it difficult to deploy research solutions to improve healthcare services. The research presented in this thesis focuses on reducing the need to create labels for biomedical text mining by using unsupervised recurrent neural networks. The proposed method utilizes word embeddings to generate vector representations of biomedical concepts based on semantics and context. Experiments with unsupervised clustering of these biomedical concepts show that concepts that are similar to each other are clustered together. While this clustering captures different synonyms of the same concept, it also captures the similarities between various diseases and the symptoms that those diseases are symptomatic of. To test the performance of the concept vectors on corpora of documents, a document vector generation method that utilizes these concept vectors is also proposed. The document vectors thus generated are used as an input to clustering algorithms, and the results show that across multiple corpora, the proposed methods of concept and document vector generation outperform the baselines and provide more meaningful clustering. The applications of this document clustering are huge, especially in the search and retrieval space, providing clinicians, researchers and patients more holistic and comprehensive results than relying on the exclusive term that they search for. At the end, a framework for extracting clinical information that can be mapped to electronic health records from preventive care guidelines is presented. The extracted information can be integrated with the clinical decision support system of an electronic health record. A visualization tool to better understand and observe patient trajectories is also explored. Both these methods have potential to improve the preventive care services provided to patients.Item BIoU: An Improved Bounding Box Regression for Object Detection(MDPI, 2022-09-28) Ravi, Niranjan; Naqvi, Sami; El-Sharkawy, Mohamed; Electrical and Computer Engineering, School of Engineering and TechnologyObject detection is a predominant challenge in computer vision and image processing to detect instances of objects of various classes within an image or video. Recently, a new domain of vehicular platforms, e-scooters, has been widely used across domestic and urban environments. The driving behavior of e-scooter users significantly differs from other vehicles on the road, and their interactions with pedestrians are also increasing. To ensure pedestrian safety and develop an efficient traffic monitoring system, a reliable object detection system for e-scooters is required. However, existing object detectors based on IoU loss functions suffer various drawbacks when dealing with densely packed objects or inaccurate predictions. To address this problem, a new loss function, balanced-IoU (BIoU), is proposed in this article. This loss function considers the parameterized distance between the centers and the minimum and maximum edges of the bounding boxes to address the localization problem. With the help of synthetic data, a simulation experiment was carried out to analyze the bounding box regression of various losses. Extensive experiments have been carried out on a two-stage object detector, MASK_RCNN, and single-stage object detectors such as YOLOv5n6, YOLOv5x on Microsoft Common Objects in Context, SKU110k, and our custom e-scooter dataset. The proposed loss function demonstrated an increment of 3.70% at 𝐴𝑃𝑆 on the COCO dataset, 6.20% at AP55 on SKU110k, and 9.03% at AP80 of the custom e-scooter dataset.Item Collision avoidance and Drone surveillance using Thread protocol in V2V and V2I communications(IEEE, 2019-07) Chitanvis, Rajas; Ravi, Niranjan; Zantye, Tanmay; El-Sharkawy, Mohamed; Electrical and Computer Engineering, School of Engineering and TechnologyAccording to the World Health Organizations (WHO) report nearly 1.25 million people die in road accidents every year. This creates a need for Advanced Driver Assist Systems (ADAS) which can ensure safe travel. To tackle the above challenge in existing the ADAS, Intra-vehicular communications (V2V) and vehicle to infrastructure communications (V2I) has been one of the predominant research topics nowadays due to the rapid growth of automobile industries and ideology of producing autonomous cars in the near future. The key feature of V2V communication is vehicle to vehicle collision detection by transmitting information like vehicle speed and position of a vehicle to other vehicles in the same location using wireless sensor networks (WSN). On the other hand, Unmanned Aerial Vehicle (UAV) systems are growing at a rapid rate in various aspects of life including dispatch of medicines and undergo video surveillance during an emergency due to less air traffic. This paper demonstrates the practice of integrating V2V communication with Thread, one of the low power WSN for data transmission, to initiate adaptive cruise control in a vehicle during a crisis. Also, UAV systems are employed as a part of V2I system to provide aerial view video surveillance if any accident occurs.Item Comparing Pso-Based Clustering Over Contextual Vector Embeddings to Modern Topic Modeling(2022-05) Miles, Samuel; Ben Miled, Zina; Salama, Paul; El-Sharkawy, MohamedEfficient topic modeling is needed to support applications that aim at identifying main themes from a collection of documents. In this thesis, a reduced vector embedding representation and particle swarm optimization (PSO) are combined to develop a topic modeling strategy that is able to identify representative themes from a large collection of documents. Documents are encoded using a reduced, contextual vector embedding from a general-purpose pre-trained language model (sBERT). A modified PSO algorithm (pPSO) that tracks particle fitness on a dimension-by-dimension basis is then applied to these embeddings to create clusters of related documents. The proposed methodology is demonstrated on three datasets across different domains. The first dataset consists of posts from the online health forum r/Cancer. The second dataset is a collection of NY Times abstracts and is used to compare the proposed model to LDA. The third is a standard benchmark dataset for topic modeling which consists of a collection of messages posted to 20 different news groups. It is used to compare state-of-the-art generative document models (i.e., ETM and NVDM) to pPSO. The results show that pPSO is able to produce interpretable clusters. Moreover, pPSO is able to capture both common topics as well as emergent topics. The topic coherence of pPSO is comparable to that of ETM and its topic diversity is comparable to NVDM. The assignment parity of pPSO on a document completion task exceeded 90% for the 20News- Groups dataset. This rate drops to approximately 30% when pPSO is applied to the same Skip-Gram embedding derived from a limited, corpus-specific vocabulary which is used by ETM and NVDM.