Electrical & Computer Engineering Department Theses and Dissertations

Permanent URI for this collection

https://hdl.handle.net/1805/2087

Information about the Purdue School of Engineering and Technology Graduate Degree Programs available at IUPUI can be found at: http://www.engr.iupui.edu/academics.shtml

Browse

Now showing 1 - 10 of 242

A 2D PLUS DEPTH VIDEO CAMERA PROTOTYPE USING DEPTH FROM DEFOCUS IMAGING AND A SINGLE MICROFLUIDIC LENS
(2011-08) Li, Weixu; Christopher, Lauren; Rizkalla, Maher E.; Salama, Paul
A new method for capturing 3D video from a single imager and lens is introduced in this research. The benefit of this method is that it does not have the calibration and alignment issues associated with binocular 3D video cameras, and allows for a less expensive overall system. The digital imaging technique Depth from Defocus (DfD) has been successfully used in still camera imaging to develop a depth map associated with the image. However, DfD has not been applied in real-time video so far since the focus mechanisms are too slow to produce real-time results. This new research result shows that a Microfluidic lens is capable of the required focal length changes at 2x video frame rate, due to the electrostatic control of the focus. During the processing, two focus settings per output frame are captured using this lens combined with a broadcast video camera prototype. We show that the DfD technique using Bayesian Markov Random Field optimization can produce a valid depth map.
3-D Scene Reconstruction for Passive Ranging Using Depth from Defocus and Deep Learning
(2019-08) Emerson, David R.; Christopher, Lauren A.; Ben Miled, Zina; King, Brian; Salama, Paul
Depth estimation is increasingly becoming more important in computer vision. The requirement for autonomous systems to gauge their surroundings is of the utmost importance in order to avoid obstacles, preventing damage to itself and/or other systems or people. Depth measuring/estimation systems that use multiple cameras from multiple views can be expensive and extremely complex. And as these autonomous systems decrease in size and available power, the supporting sensors required to estimate depth must also shrink in size and power consumption. This research will concentrate on a single passive method known as Depth from Defocus (DfD), which uses an in-focus and out-of-focus image to infer the depth of objects in a scene. The major contribution of this research is the introduction of a new Deep Learning (DL) architecture to process the the in-focus and out-of-focus images to produce a depth map for the scene improving both speed and performance over a range of lighting conditions. Compared to the previous state-of-the-art multi-label graph cuts algorithms applied to the synthetically blurred dataset the DfD-Net produced a 34.30% improvement in the average Normalized Root Mean Square Error (NRMSE). Similarly the DfD-Net architecture produced a 76.69% improvement in the average Normalized Mean Absolute Error (NMAE). Only the Structural Similarity Index (SSIM) had a small average decrease of 2.68% when compared to the graph cuts algorithm. This slight reduction in the SSIM value is a result of the SSIM metric penalizing images that appear to be noisy. In some instances the DfD-Net output is mottled, which is interpreted as noise by the SSIM metric. This research introduces two methods of deep learning architecture optimization. The first method employs the use of a variant of the Particle Swarm Optimization (PSO) algorithm to improve the performance of the DfD-Net architecture. The PSO algorithm was able to find a combination of the number of convolutional filters, the size of the filters, the activation layers used, the use of a batch normalization layer between filters and the size of the input image used during training to produce a network architecture that resulted in an average NRMSE that was approximately 6.25% better than the baseline DfD-Net average NRMSE. This optimized architecture also resulted in an average NMAE that was 5.25% better than the baseline DfD-Net average NMAE. Only the SSIM metric did not see a gain in performance, dropping by 0.26% when compared to the baseline DfD-Net average SSIM value. The second method illustrates the use of a Self Organizing Map clustering method to reduce the number convolutional filters in the DfD-Net to reduce the overall run time of the architecture while still retaining the network performance exhibited prior to the reduction. This method produces a reduced DfD-Net architecture that has a run time decrease of between 14.91% and 44.85% depending on the hardware architecture that is running the network. The final reduced DfD-Net resulted in a network architecture that had an overall decrease in the average NRMSE value of approximately 3.4% when compared to the baseline, unaltered DfD-Net, mean NRMSE value. The NMAE and the SSIM results for the reduced architecture were 0.65% and 0.13% below the baseline results respectively. This illustrates that reducing the network architecture complexity does not necessarily reduce the reduction in performance. Finally, this research introduced a new, real world dataset that was captured using a camera and a voltage controlled microfluidic lens to capture the visual data and a 2-D scanning LIDAR to capture the ground truth data. The visual data consists of images captured at seven different exposure times and 17 discrete voltage steps per exposure time. The objects in this dataset were divided into four repeating scene patterns in which the same surfaces were used. These scenes were located between 1.5 and 2.5 meters from the camera and LIDAR. This was done so any of the deep learning algorithms tested would see the same texture at multiple depths and multiple blurs. The DfD-Net architecture was employed in two separate tests using the real world dataset. The first test was the synthetic blurring of the real world dataset and assessing the performance of the DfD-Net trained on the Middlebury dataset. The results of the real world dataset for the scenes that were between 1.5 and 2.2 meters from the camera the DfD-Net trained on the Middlebury dataset produced an average NRMSE, NMAE and SSIM value that exceeded the test results of the DfD-Net tested on the Middlebury test set. The second test conducted was the training and testing solely on the real world dataset. Analysis of the camera and lens behavior led to an optimal lens voltage step configuration of 141 and 129. Using this configuration, training the DfD-Net resulted in an average NRMSE, NMAE and SSIM of 0.0660, 0.0517 and 0.8028 with a standard deviation of 0.0173, 0.0186 and 0.0641 respectively.
3D EM/MPM MEDICAL IMAGE SEGMENTATION USING AN FPGA EMBEDDED DESIGN IMPLEMENTATION
(2011-08) Liu, Chao; Christopher, Lauren; Rizkalla, Maher E.; Salama, Paul
This thesis presents a Field Programmable Gate Array (FPGA) based embedded system which is used to achieve high speed segmentation of 3D images. Segmenta- tion is performed using Expectation-Maximization with Maximization of Posterior Marginals (EM/MPM) Bayesian algorithm. In this system, the embedded processor controls a custom circuit which performs the MPM and portions of the EM algorithm. The embedded processor completes the EM algorithm and also controls image data transmission between host computer and on-board memory. The whole system has been implemented on Xilinx Virtex 6 FPGA and achieved over 100 times improvement compared to standard desktop computing hardware.
3D ENDOSCOPY VIDEO GENERATED USING DEPTH INFERENCE: CONVERTING 2D TO 3D
(2013-08-20) Rao, Swetcha; Christopher, Lauren; Rizkalla, Maher E.; Salama, Paul; King, Brian
A novel algorithm was developed to convert raw 2-dimensional endoscope videos into 3-dimensional view. Minimally invasive surgeries aided with 3D view of the invivo site have shown to reduce errors and improve training time compared to those with 2D view. The novelty of this algorithm is that two cues in the images have been used to develop the 3D. Illumination is the rst cue used to nd the darkest regions in the endoscopy images in order to locate the vanishing point(s). The second cue is the presence of ridge-like structures in the in-vivo images of the endoscopy image sequence. Edge detection is used to map these ridge-like structures into concentric ellipses with their common center at the darkest spot. Then, these two observations are used to infer the depth of the endoscopy videos; which then serves to convert them from 2D to 3D. The processing time is between 21 seconds to 20 minutes for each frame, on a 2.27GHz CPU. The time depends on the number of edge pixels present in the edge-detection image. The accuracy of ellipse detection was measured to be 98.98% to 99.99%. The algorithm was tested on 3 truth images with known ellipse parameters and also on real bronchoscopy image sequences from two surgical procedures. Out of 1020 frames tested in total, 688 frames had single vanishing point while 332 frames had two vanishing points. Our algorithm detected the single vanishing point in 653 of the 688 frames and two vanishing points in 322 of the 332 frames.
3D Image Segmentation Implementation on FPGA Using EM/MPM Algorithm
(2010-12) Sun, Yan; Christopher, Lauren; Rizkalla, Maher E.; Salama, Paul
In this thesis, 3D image segmentation is targeted to a Xilinx Field Programmable Gate Array (FPGA), and verified with extensive simulation. Segmentation is performed using the Expectation-Maximization with Maximization of the Posterior Marginals (EM/MPM) Bayesian algorithm. This algorithm segments the 3D image using neighboring pixels based on a Markov Random Field (MRF) model. This iterative algorithm is designed, synthesized and simulated for the Xilinx FPGA, and greater than 100 times speed improvement over standard desktop computer hardware is achieved. Three new techniques were the key to achieving this speed: Pipelined computational cores, sixteen parallel data paths and a novel memory interface for maximizing the external memory bandwidth. Seven MPM segmentation iterations are matched to the external memory bandwidth required of a single source file read, and a single segmented file write, plus a small amount of latency.
3D Object Detection Using Virtual Environment Assisted Deep Network Training
(2020-12) Dale, Ashley S.; Christopher, Lauren; King, Brian; Salama, Paul
An RGBZ synthetic dataset consisting of five object classes in a variety of virtual environments and orientations was combined with a small sample of real-world image data and used to train the Mask R-CNN (MR-CNN) architecture in a variety of configurations. When the MR-CNN architecture was initialized with MS COCO weights and the heads were trained with a mix of synthetic data and real world data, F1 scores improved in four of the five classes: The average maximum F1-score of all classes and all epochs for the networks trained with synthetic data is F1∗ = 0.91, compared to F1 = 0.89 for the networks trained exclusively with real data, and the standard deviation of the maximum mean F1-score for synthetically trained networks is σ∗ = 0.015, compared to σ_F1 = 0.020 for the networks trained exclusively with real F1 data. Various backgrounds in synthetic data were shown to have negligible impact on F1 scores, opening the door to abstract backgrounds and minimizing the need for intensive synthetic data fabrication. When the MR-CNN architecture was initialized with MS COCO weights and depth data was included in the training data, the net- work was shown to rely heavily on the initial convolutional input to feed features into the network, the image depth channel was shown to influence mask generation, and the image color channels were shown to influence object classification. A set of latent variables for a subset of the synthetic datatset was generated with a Variational Autoencoder then analyzed using Principle Component Analysis and Uniform Manifold Projection and Approximation (UMAP). The UMAP analysis showed no meaningful distinction between real-world and synthetic data, and a small bias towards clustering based on image background.
3d terrain visualization and CPU parallelization of particle swarm optimization
(2018) Wieczorek, Calvin L.; Christopher, Lauren; King, Brian; Lee, John
Particle Swarm Optimization is a bio-inspired optimization technique used to approximately solve the non-deterministic polynomial (NP) problem of asset allocation in 3D space, frequency, antenna azimuth [1], and elevation orientation [1]. This research uses QT Data Visualization to display the PSO solutions, assets, transmitters in 3D space from the work done in [2]. Elevation and Imagery data was extracted from ARCGIS (a geographic information system (GIS) database) to add overlapping elevation and imagery data to that the 3D visualization displays proper topological data. The 3D environment range was improved and is now dynamic; giving the user appropriate coordinates based from the ARCGIS latitude and longitude ranges. The second part of the research improves the performance of the PSOs runtime, using OpenMP with CPU threading to parallelize the evaluation of the PSO by particle. Lastly, this implementation uses CPU multithreading with 4 threads to improve the performance of the PSO by 42% - 51% in comparison to running the PSO without CPU multithreading. The contributions provided allow for the PSO project to be more realistically simulate its use in the Electronic Warfare (EW) space, adding additional CPU multithreading implementation for further performance improvements.
A Multi-head Attention Approach with Complementary Multimodal Fusion for Vehicle Detection
(2024-05) Tabassum, Nujhat; El-Sharkawy, Mohamed; King, Brian; Rizkalla, Maher
The advancement of autonomous vehicle technologies has taken a significant leap with the development of an improved version of the Multimodal Vehicle Detection Network (MVDNet), distinguished by the integration of a multi-head attention layer. This key enhancement significantly refines the network's capability to process and integrate multimodal sensor data, an aspect that becomes crucial in the face of challenging weather conditions. The effectiveness of this upgraded Multi-Head MVDNet is rigorously verified through an extensive dataset acquired from the Oxford Radar Robotcar, demonstrating its enhanced performance capabilities. Notably, in complex environmental conditions, the Multi-Head MVDNet shows a marked superiority in terms of Average Precision (AP) compared to existing models, underscoring its advanced detection capabilities. The transition from the traditional MVDNet to the enhanced Multi-Head Vehicle Detection Network signifies a notable breakthrough in the arena of vehicle detection technologies, with a special emphasis on operation under severe meteorological conditions, such as the obscuring presence of dense fog or the complexities introduced by heavy snowfall. This significant enhancement capitalizes on the foundational principles of the original MVDNet, which skillfully amalgamates the individual strengths of lidar and radar sensors. This is achieved through an intricate and refined process of feature tensor fusion, creating a more robust and comprehensive sensory data interpretation framework. A major innovation introduced in this updated model is the implementation of a multi-head attention layer. This layer serves as a sophisticated replacement for the previously employed self-attention mechanism. Segmenting the attention mechanism into several distinct partitions enhances the network's efficiency and accuracy in processing and interpreting vast arrays of sensor data. An exhaustive series of experimental analyses was undertaken to determine the optimal configuration of this multi-head attention mechanism. These experiments explored various combinations and settings, ultimately identifying a configuration consisting of seven distinct attention heads as the most effective. This setup was found to optimize the balance between computational efficiency and detection accuracy. When tested using the rich radar and lidar datasets from the ORR project, this advanced Multi-Head MVDNet configuration consistently demonstrated its superiority. It not only surpassed the performance of the original MVDNet but also showed marked improvements over models that relied solely on lidar data or the DEF models, especially in terms of vehicular detection accuracy. This enhancement in the MVDNet model, with its focus on multi-head attention, not only represents a significant leap in the field of autonomous vehicle detection but also lays a foundation for future research. It opens new pathways for exploring various attention mechanisms and their potential applicability in scenarios requiring real-time vehicle detection. Furthermore, it accentuates the importance of sophisticated sensor fusion techniques as vital tools in overcoming the challenges posed by adverse environmental conditions, thus paving the way for more resilient and reliable autonomous vehicular technologies.
Acoustic Simultaneous Localization And Mapping (SLAM)
(2021-12) Madan, Akul; Li, Lingxi; Chen, Yaobin; King, Brian
The current technologies employed for autonomous driving provide tremendous performance and results, but the technology itself is far from mature and relatively expensive. Some of the most commonly used components for autonomous driving include LiDAR, cameras, radar, and ultrasonic sensors. Sensors like such are usually high-priced and often require a tremendous amount of computational power in order to process the gathered data. Many car manufacturers consider cameras to be a low-cost alternative to some other costly sensors, but camera based sensors alone are prone to fatal perception errors. In many cases, adverse weather and night-time conditions hinder the performance of some vision based sensors. In order for a sensor to be a reliable source of data, the difference between actual data values and measured or perceived values should be as low as possible. Lowering the number of sensors used provides more economic freedom to invest in the reliability of the components used. This thesis provides an alternative approach to the current autonomous driving methodologies by utilizing acoustic signatures of moving objects. This approach makes use of a microphone array to collect and process acoustic signatures captured for simultaneous localization and mapping (SLAM). Rather than using numerous sensors to gather information about the surroundings that are beyond the reach of the user, this method investigates the benefits of considering the sound waves of different objects around the host vehicle for SLAM. The components used in this model are cost-efficient and generate data that is easy to process without requiring high processing power. The results prove that there are benefits in pursuing this approach in terms of cost efficiency and low computational power. The functionality of the model is demonstrated using MATLAB for data collection and testing.
An Adaptive Eye Gaze Tracking System Without Calibration for Use in an Automobile
(2011) Rajabather, Harikrishna K.; Koskie, Sarah; Chen, Yaobin; Christopher, Lauren
One of the biggest hurdles to the development of an effective driver state monitor is the that there is no real-time eye-gaze detection. This is primarily due to the fact that such systems require calibration. In this thesis the various aspects that comprise an eye gaze tracker are investigated. From that we developed an eye gaze tracker for automobiles that does not require calibration. We used a monocular camera system with IR light sources placed in each of the three mirrors. The camera system created the bright-pupil effect for robust pupil detection and tracking. We developed an SVM based algorithm for initial eye candidate detection; after that the eyes were tracked using a hybrid Kalman/Mean-shift algorithm. From the tracked pupils, various features such as the location of the glints (reflections in the pupil from the IR light sources) were extracted. This information is then fed into a Generalized Regression Neural Network (GRNN). The GRNN then maps this information into one of thirteen gaze regions in the vehicle.

Browse

Browsing Electrical & Computer Engineering Department Theses and Dissertations by Title

Results Per Page

Sort Options