- Browse by Author
Browsing by Author "King, Brian"
Now showing 1 - 10 of 141
Results Per Page
Sort Options
Item 3-D Scene Reconstruction for Passive Ranging Using Depth from Defocus and Deep Learning(2019-08) Emerson, David R.; Christopher, Lauren A.; Ben Miled, Zina; King, Brian; Salama, PaulDepth estimation is increasingly becoming more important in computer vision. The requirement for autonomous systems to gauge their surroundings is of the utmost importance in order to avoid obstacles, preventing damage to itself and/or other systems or people. Depth measuring/estimation systems that use multiple cameras from multiple views can be expensive and extremely complex. And as these autonomous systems decrease in size and available power, the supporting sensors required to estimate depth must also shrink in size and power consumption. This research will concentrate on a single passive method known as Depth from Defocus (DfD), which uses an in-focus and out-of-focus image to infer the depth of objects in a scene. The major contribution of this research is the introduction of a new Deep Learning (DL) architecture to process the the in-focus and out-of-focus images to produce a depth map for the scene improving both speed and performance over a range of lighting conditions. Compared to the previous state-of-the-art multi-label graph cuts algorithms applied to the synthetically blurred dataset the DfD-Net produced a 34.30% improvement in the average Normalized Root Mean Square Error (NRMSE). Similarly the DfD-Net architecture produced a 76.69% improvement in the average Normalized Mean Absolute Error (NMAE). Only the Structural Similarity Index (SSIM) had a small average decrease of 2.68% when compared to the graph cuts algorithm. This slight reduction in the SSIM value is a result of the SSIM metric penalizing images that appear to be noisy. In some instances the DfD-Net output is mottled, which is interpreted as noise by the SSIM metric. This research introduces two methods of deep learning architecture optimization. The first method employs the use of a variant of the Particle Swarm Optimization (PSO) algorithm to improve the performance of the DfD-Net architecture. The PSO algorithm was able to find a combination of the number of convolutional filters, the size of the filters, the activation layers used, the use of a batch normalization layer between filters and the size of the input image used during training to produce a network architecture that resulted in an average NRMSE that was approximately 6.25% better than the baseline DfD-Net average NRMSE. This optimized architecture also resulted in an average NMAE that was 5.25% better than the baseline DfD-Net average NMAE. Only the SSIM metric did not see a gain in performance, dropping by 0.26% when compared to the baseline DfD-Net average SSIM value. The second method illustrates the use of a Self Organizing Map clustering method to reduce the number convolutional filters in the DfD-Net to reduce the overall run time of the architecture while still retaining the network performance exhibited prior to the reduction. This method produces a reduced DfD-Net architecture that has a run time decrease of between 14.91% and 44.85% depending on the hardware architecture that is running the network. The final reduced DfD-Net resulted in a network architecture that had an overall decrease in the average NRMSE value of approximately 3.4% when compared to the baseline, unaltered DfD-Net, mean NRMSE value. The NMAE and the SSIM results for the reduced architecture were 0.65% and 0.13% below the baseline results respectively. This illustrates that reducing the network architecture complexity does not necessarily reduce the reduction in performance. Finally, this research introduced a new, real world dataset that was captured using a camera and a voltage controlled microfluidic lens to capture the visual data and a 2-D scanning LIDAR to capture the ground truth data. The visual data consists of images captured at seven different exposure times and 17 discrete voltage steps per exposure time. The objects in this dataset were divided into four repeating scene patterns in which the same surfaces were used. These scenes were located between 1.5 and 2.5 meters from the camera and LIDAR. This was done so any of the deep learning algorithms tested would see the same texture at multiple depths and multiple blurs. The DfD-Net architecture was employed in two separate tests using the real world dataset. The first test was the synthetic blurring of the real world dataset and assessing the performance of the DfD-Net trained on the Middlebury dataset. The results of the real world dataset for the scenes that were between 1.5 and 2.2 meters from the camera the DfD-Net trained on the Middlebury dataset produced an average NRMSE, NMAE and SSIM value that exceeded the test results of the DfD-Net tested on the Middlebury test set. The second test conducted was the training and testing solely on the real world dataset. Analysis of the camera and lens behavior led to an optimal lens voltage step configuration of 141 and 129. Using this configuration, training the DfD-Net resulted in an average NRMSE, NMAE and SSIM of 0.0660, 0.0517 and 0.8028 with a standard deviation of 0.0173, 0.0186 and 0.0641 respectively.Item 3D ENDOSCOPY VIDEO GENERATED USING DEPTH INFERENCE: CONVERTING 2D TO 3D(2013-08-20) Rao, Swetcha; Christopher, Lauren; Rizkalla, Maher E.; Salama, Paul; King, BrianA novel algorithm was developed to convert raw 2-dimensional endoscope videos into 3-dimensional view. Minimally invasive surgeries aided with 3D view of the invivo site have shown to reduce errors and improve training time compared to those with 2D view. The novelty of this algorithm is that two cues in the images have been used to develop the 3D. Illumination is the rst cue used to nd the darkest regions in the endoscopy images in order to locate the vanishing point(s). The second cue is the presence of ridge-like structures in the in-vivo images of the endoscopy image sequence. Edge detection is used to map these ridge-like structures into concentric ellipses with their common center at the darkest spot. Then, these two observations are used to infer the depth of the endoscopy videos; which then serves to convert them from 2D to 3D. The processing time is between 21 seconds to 20 minutes for each frame, on a 2.27GHz CPU. The time depends on the number of edge pixels present in the edge-detection image. The accuracy of ellipse detection was measured to be 98.98% to 99.99%. The algorithm was tested on 3 truth images with known ellipse parameters and also on real bronchoscopy image sequences from two surgical procedures. Out of 1020 frames tested in total, 688 frames had single vanishing point while 332 frames had two vanishing points. Our algorithm detected the single vanishing point in 653 of the 688 frames and two vanishing points in 322 of the 332 frames.Item 3D Object Detection Using Virtual Environment Assisted Deep Network Training(2020-12) Dale, Ashley S.; Christopher, Lauren; King, Brian; Salama, PaulAn RGBZ synthetic dataset consisting of five object classes in a variety of virtual environments and orientations was combined with a small sample of real-world image data and used to train the Mask R-CNN (MR-CNN) architecture in a variety of configurations. When the MR-CNN architecture was initialized with MS COCO weights and the heads were trained with a mix of synthetic data and real world data, F1 scores improved in four of the five classes: The average maximum F1-score of all classes and all epochs for the networks trained with synthetic data is F1∗ = 0.91, compared to F1 = 0.89 for the networks trained exclusively with real data, and the standard deviation of the maximum mean F1-score for synthetically trained networks is σ∗ = 0.015, compared to σ_F1 = 0.020 for the networks trained exclusively with real F1 data. Various backgrounds in synthetic data were shown to have negligible impact on F1 scores, opening the door to abstract backgrounds and minimizing the need for intensive synthetic data fabrication. When the MR-CNN architecture was initialized with MS COCO weights and depth data was included in the training data, the net- work was shown to rely heavily on the initial convolutional input to feed features into the network, the image depth channel was shown to influence mask generation, and the image color channels were shown to influence object classification. A set of latent variables for a subset of the synthetic datatset was generated with a Variational Autoencoder then analyzed using Principle Component Analysis and Uniform Manifold Projection and Approximation (UMAP). The UMAP analysis showed no meaningful distinction between real-world and synthetic data, and a small bias towards clustering based on image background.Item 3d terrain visualization and CPU parallelization of particle swarm optimization(2018) Wieczorek, Calvin L.; Christopher, Lauren; King, Brian; Lee, JohnParticle Swarm Optimization is a bio-inspired optimization technique used to approximately solve the non-deterministic polynomial (NP) problem of asset allocation in 3D space, frequency, antenna azimuth [1], and elevation orientation [1]. This research uses QT Data Visualization to display the PSO solutions, assets, transmitters in 3D space from the work done in [2]. Elevation and Imagery data was extracted from ARCGIS (a geographic information system (GIS) database) to add overlapping elevation and imagery data to that the 3D visualization displays proper topological data. The 3D environment range was improved and is now dynamic; giving the user appropriate coordinates based from the ARCGIS latitude and longitude ranges. The second part of the research improves the performance of the PSOs runtime, using OpenMP with CPU threading to parallelize the evaluation of the PSO by particle. Lastly, this implementation uses CPU multithreading with 4 threads to improve the performance of the PSO by 42% - 51% in comparison to running the PSO without CPU multithreading. The contributions provided allow for the PSO project to be more realistically simulate its use in the Electronic Warfare (EW) space, adding additional CPU multithreading implementation for further performance improvements.Item A Multi-head Attention Approach with Complementary Multimodal Fusion for Vehicle Detection(2024-05) Tabassum, Nujhat; El-Sharkawy, Mohamed; King, Brian; Rizkalla, MaherThe advancement of autonomous vehicle technologies has taken a significant leap with the development of an improved version of the Multimodal Vehicle Detection Network (MVDNet), distinguished by the integration of a multi-head attention layer. This key enhancement significantly refines the network's capability to process and integrate multimodal sensor data, an aspect that becomes crucial in the face of challenging weather conditions. The effectiveness of this upgraded Multi-Head MVDNet is rigorously verified through an extensive dataset acquired from the Oxford Radar Robotcar, demonstrating its enhanced performance capabilities. Notably, in complex environmental conditions, the Multi-Head MVDNet shows a marked superiority in terms of Average Precision (AP) compared to existing models, underscoring its advanced detection capabilities. The transition from the traditional MVDNet to the enhanced Multi-Head Vehicle Detection Network signifies a notable breakthrough in the arena of vehicle detection technologies, with a special emphasis on operation under severe meteorological conditions, such as the obscuring presence of dense fog or the complexities introduced by heavy snowfall. This significant enhancement capitalizes on the foundational principles of the original MVDNet, which skillfully amalgamates the individual strengths of lidar and radar sensors. This is achieved through an intricate and refined process of feature tensor fusion, creating a more robust and comprehensive sensory data interpretation framework. A major innovation introduced in this updated model is the implementation of a multi-head attention layer. This layer serves as a sophisticated replacement for the previously employed self-attention mechanism. Segmenting the attention mechanism into several distinct partitions enhances the network's efficiency and accuracy in processing and interpreting vast arrays of sensor data. An exhaustive series of experimental analyses was undertaken to determine the optimal configuration of this multi-head attention mechanism. These experiments explored various combinations and settings, ultimately identifying a configuration consisting of seven distinct attention heads as the most effective. This setup was found to optimize the balance between computational efficiency and detection accuracy. When tested using the rich radar and lidar datasets from the ORR project, this advanced Multi-Head MVDNet configuration consistently demonstrated its superiority. It not only surpassed the performance of the original MVDNet but also showed marked improvements over models that relied solely on lidar data or the DEF models, especially in terms of vehicular detection accuracy. This enhancement in the MVDNet model, with its focus on multi-head attention, not only represents a significant leap in the field of autonomous vehicle detection but also lays a foundation for future research. It opens new pathways for exploring various attention mechanisms and their potential applicability in scenarios requiring real-time vehicle detection. Furthermore, it accentuates the importance of sophisticated sensor fusion techniques as vital tools in overcoming the challenges posed by adverse environmental conditions, thus paving the way for more resilient and reliable autonomous vehicular technologies.Item Acoustic Simultaneous Localization And Mapping (SLAM)(2021-12) Madan, Akul; Li, Lingxi; Chen, Yaobin; King, BrianThe current technologies employed for autonomous driving provide tremendous performance and results, but the technology itself is far from mature and relatively expensive. Some of the most commonly used components for autonomous driving include LiDAR, cameras, radar, and ultrasonic sensors. Sensors like such are usually high-priced and often require a tremendous amount of computational power in order to process the gathered data. Many car manufacturers consider cameras to be a low-cost alternative to some other costly sensors, but camera based sensors alone are prone to fatal perception errors. In many cases, adverse weather and night-time conditions hinder the performance of some vision based sensors. In order for a sensor to be a reliable source of data, the difference between actual data values and measured or perceived values should be as low as possible. Lowering the number of sensors used provides more economic freedom to invest in the reliability of the components used. This thesis provides an alternative approach to the current autonomous driving methodologies by utilizing acoustic signatures of moving objects. This approach makes use of a microphone array to collect and process acoustic signatures captured for simultaneous localization and mapping (SLAM). Rather than using numerous sensors to gather information about the surroundings that are beyond the reach of the user, this method investigates the benefits of considering the sound waves of different objects around the host vehicle for SLAM. The components used in this model are cost-efficient and generate data that is easy to process without requiring high processing power. The results prove that there are benefits in pursuing this approach in terms of cost efficiency and low computational power. The functionality of the model is demonstrated using MATLAB for data collection and testing.Item Advancing profiling sensors with a wireless approach(2013-11-20) Galvis, Alejandro; Russomanno, David J.; Li, Feng; Rizkalla, Maher E.; King, BrianIn general, profiling sensors are low-cost crude imagers that typically utilize a sparse detector array, whereas traditional cameras employ a dense focal-plane array. Profiling sensors are of particular interest in applications that require classification of a sensed object into broad categories, such as human, animal, or vehicle. However, profiling sensors have many other applications in which reliable classification of a crude silhouette or profile produced by the sensor is of value. The notion of a profiling sensor was first realized by a Near-Infrared (N-IR), retro-reflective prototype consisting of a vertical column of sparse detectors. Alternative arrangements of detectors have been implemented in which a subset of the detectors have been offset from the vertical column and placed at arbitrary locations along the anticipated path of the objects of interest. All prior work with the N-IR, retro-reflective profiling sensors has consisted of wired detectors. This thesis surveys prior work and advances this work with a wireless profiling sensor prototype in which each detector is a wireless sensor node and the aggregation of these nodes comprises a profiling sensor’s field of view. In this novel approach, a base station pre-processes the data collected from the sensor nodes, including data realignment, prior to its classification through a back-propagation neural network. Such a wireless detector configuration advances deployment options for N-IR, retro-reflective profiling sensors.Item Airgap-less Electric Motor(2021-08) Alibeik, Maryam; dos Santos, Euzeli; King, Brian; Li, Lingxi; Rovnyak, StevenThis dissertation focuses mainly on the airgap-less electric machine. An extensive literature review has been presented along with a systematic study that included analytical modeling, simulation with both steady-state and transient analysis, prototype building, and experimental validation. In this type of device, the rotor is allowed to touch the stator at a contact point, which maximizes the internal flux and therefore the electromagnetic torque. A higher torque density motor is proposed in this dissertation due to a reduced reluctance caused by zero airgap situation. A comparison with other high torque density electric machines demonstrates the advantages of the proposed machine. Switched reluctance motor for hybrid vehicle, integrated magnetic gear, induction machines, are some examples of the machines with lower torque density than the airgap-less electric machine. This machine will maximize the generated torque allowing these type of machines to be competitive in applications where hydraulic motors are prevalent, i.e., low-speed and high-torque requirements. Hydraulic motor systems face two major problems with their braking system and with low efficiency due to a large number of energy conversion stages (i.e., motor-pump, hydraulic connections and the hydraulic motor itself). The proposed electric motor, unlike hydraulic motors, converts electrical energy directly to mechanical energy with no extra braking system necessary and with higher efficiency. The evolution of the airgap-less electric machine from three poles to 9 bi-poles is discussed in this dissertation. The modeling of this machine with a minimum number of poles is discussed before a generalization is presented. The simulation and analysis of the airgap-less electric motor has been done using Euler integration method as well as Runge Kutta 4th order integration method due to its higher precision. A proof-of-concept electric machine with nine magnetic bipoles is built to validate the theoretical assumptions.Item Analysis of Latent Space Representations for Object Detection(2024-08) Dale, Ashley Susan; Christopher, Lauren; King, Brian; Salama, Paul; Rizkalla, MaherDeep Neural Networks (DNNs) successfully perform object detection tasks, and the Con- volutional Neural Network (CNN) backbone is a commonly used feature extractor before secondary tasks such as detection, classification, or segmentation. In a DNN model, the relationship between the features learned by the model from the training data and the features leveraged by the model during test and deployment has motivated the area of feature interpretability studies. The work presented here applies equally to white-box and black-box models and to any DNN architecture. The metrics developed do not require any information beyond the feature vector generated by the feature extraction backbone. These methods are therefore the first methods capable of estimating black-box model robustness in terms of latent space complexity and the first methods capable of examining feature representations in the latent space of black box models. This work contributes the following four novel methodologies and results. First, a method for quantifying the invariance and/or equivariance of a model using the training data shows that the representation of a feature in the model impacts model performance. Second, a method for quantifying an observed domain gap in a dataset using the latent feature vectors of an object detection model is paired with pixel-level augmentation techniques to close the gap between real and synthetic data. This results in an improvement in the model’s F1 score on a test set of outliers from 0.5 to 0.9. Third, a method for visualizing and quantifying similarities of the latent manifolds of two black-box models is used to correlate similar feature representation with increase success in the transferability of gradient-based attacks. Finally, a method for examining the global complexity of decision boundaries in black-box models is presented, where more complex decision boundaries are shown to correlate with increased model robustness to gradient-based and random attacks.Item Applying Different Wide-Area Response-Based Controls to Different Contingencies in Power Systems(2019-08) Iranmanesh, Shahrzad; Steven, Rovnyak; King, Brian; dos Santos, Euzeli CiprianoThe electrical disturbances in the power system have threatened the stability of the system. In the first step, it is necessary to detect these electrical disturbances or events. In the next step, a proper control should apply to the system to decrease the consequences of the disturbances. One-shot control is one of the effective methods for stabilizing the events. In this method, a proper amount of loads are increased or decreased to the electrical system. Determining the amounts of loads, and the location for shedding is crucial. Moreover, some control combinations are more effective for some events and less effective for some others. Therefore, this project is completed in two different sections. First, finding the effective control combinations, second, finding an algorithm for applying different control combinations to different contingencies in real-time. To find effective control combinations, sensitivity analysis is employed to locate the most effective loads in the system. Then to find the control combination commands, gradient descent, and PSO algorithm are used in this project. In the next step, a pattern recognition method is used to apply the appropriate control combination for every event. The decision tree is selected as the pattern recognition method. The three most effective control combinations found by sensitivity analysis and the PSO method are used in the remainder of this study. A decision tree is trained for each of the three control combinations, and their outputs are combined into an algorithm for selecting the best control in real-time. Finally, the algorithm is evaluated using a test set of contingencies. The final results reveal a 30\% improvement in comparison to the previous studies.