3D Object Detection Using Virtual Environment Assisted Deep Network Training

dc.contributor.advisorChristopher, Lauren
dc.contributor.authorDale, Ashley S.
dc.contributor.otherKing, Brian
dc.contributor.otherSalama, Paul
dc.date.accessioned2021-01-05T16:11:20Z
dc.date.available2021-01-05T16:11:20Z
dc.date.issued2020-12
dc.degree.date2020en_US
dc.degree.disciplineElectrical & Computer Engineeringen
dc.degree.grantorPurdue Universityen_US
dc.degree.levelM.S.E.C.E.en_US
dc.descriptionIndiana University-Purdue University Indianapolis (IUPUI)en_US
dc.description.abstractAn RGBZ synthetic dataset consisting of five object classes in a variety of virtual environments and orientations was combined with a small sample of real-world image data and used to train the Mask R-CNN (MR-CNN) architecture in a variety of configurations. When the MR-CNN architecture was initialized with MS COCO weights and the heads were trained with a mix of synthetic data and real world data, F1 scores improved in four of the five classes: The average maximum F1-score of all classes and all epochs for the networks trained with synthetic data is F1∗ = 0.91, compared to F1 = 0.89 for the networks trained exclusively with real data, and the standard deviation of the maximum mean F1-score for synthetically trained networks is σ∗ = 0.015, compared to σ_F1 = 0.020 for the networks trained exclusively with real F1 data. Various backgrounds in synthetic data were shown to have negligible impact on F1 scores, opening the door to abstract backgrounds and minimizing the need for intensive synthetic data fabrication. When the MR-CNN architecture was initialized with MS COCO weights and depth data was included in the training data, the net- work was shown to rely heavily on the initial convolutional input to feed features into the network, the image depth channel was shown to influence mask generation, and the image color channels were shown to influence object classification. A set of latent variables for a subset of the synthetic datatset was generated with a Variational Autoencoder then analyzed using Principle Component Analysis and Uniform Manifold Projection and Approximation (UMAP). The UMAP analysis showed no meaningful distinction between real-world and synthetic data, and a small bias towards clustering based on image background.en_US
dc.identifier.urihttps://hdl.handle.net/1805/24756
dc.identifier.urihttp://dx.doi.org/10.7912/C2/2594
dc.language.isoen_USen_US
dc.rightsAttribution-ShareAlike 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by-sa/4.0/*
dc.subjectMachine Learningen_US
dc.subjectMASK R-CNNen_US
dc.subjectARTIFICIAL INTELLIGENCEen_US
dc.subjectIMAGE PROCESSINGen_US
dc.subject3D IMAGEen_US
dc.subjectSIGNAL PROCESSINGen_US
dc.subjectOBJECT DETECTIONen_US
dc.subjectTHREAT DETECTIONen_US
dc.subjectVIRTUAL ENVIRONMENTSen_US
dc.subjectSYNTHETIC DATASETen_US
dc.subjectIMAGE SEGMENTATIONen_US
dc.subjectRGBDen_US
dc.subjectRGBD VIDEOen_US
dc.subjectRGBZen_US
dc.subjectALGORITHMen_US
dc.subjectMS COCOen_US
dc.subjectTRANSFER LEARNINGen_US
dc.title3D Object Detection Using Virtual Environment Assisted Deep Network Trainingen_US
dc.typeThesisen
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
3D_OBJECT_DETECTION_USING_VIRTUAL_ENVIRONMENT_ASSISTED_DEEP_NETWORK_TRAINING.pdf
Size:
85.14 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.99 KB
Format:
Item-specific license agreed upon to submission
Description: