A Transfer Learning Approach to Object Detection Acceleration for Embedded Applications

dc.contributor.advisorChristopher, Lauren
dc.contributor.authorVance, Lauren M.
dc.contributor.otherKing, Brian
dc.contributor.otherRizkalla, Maher
dc.date.accessioned2021-08-10T13:21:49Z
dc.date.available2021-08-10T13:21:49Z
dc.date.issued2021-08
dc.degree.date2021en_US
dc.degree.disciplineElectrical & Computer Engineeringen
dc.degree.grantorPurdue Universityen_US
dc.degree.levelM.S.E.C.E.en_US
dc.descriptionIndiana University-Purdue University Indianapolis (IUPUI)en_US
dc.description.abstractDeep learning solutions to computer vision tasks have revolutionized many industries in recent years, but embedded systems have too many restrictions to take advantage of current state-of-the-art configurations. Typical embedded processor hardware configurations must meet very low power and memory constraints to maintain small and lightweight packaging, and the architectures of the current best deep learning models are too computationally-intensive for these hardware configurations. Current research shows that convolutional neural networks (CNNs) can be deployed with a few architectural modifications on Field-Programmable Gate Arrays (FPGAs) resulting in minimal loss of accuracy, similar or decreased processing speeds, and lower power consumption when compared to general-purpose Central Processing Units (CPUs) and Graphics Processing Units (GPUs). This research contributes further to these findings with the FPGA implementation of a YOLOv4 object detection model that was developed with the use of transfer learning. The transfer-learned model uses the weights of a model pre-trained on the MS-COCO dataset as a starting point then fine-tunes only the output layers for detection on more specific objects of five classes. The model architecture was then modified slightly for compatibility with the FPGA hardware using techniques such as weight quantization and replacing unsupported activation layer types. The model was deployed on three different hardware setups (CPU, GPU, FPGA) for inference on a test set of 100 images. It was found that the FPGA was able to achieve real-time inference speeds of 33.77 frames-per-second, a speedup of 7.74 frames-per-second when compared to GPU deployment. The model also consumed 96% less power than a GPU configuration with only approximately 4% average loss in accuracy across all 5 classes. The results are even more striking when compared to CPU deployment, with 131.7-times speedup in inference throughput. CPUs have long since been outperformed by GPUs for deep learning applications but are used in most embedded systems. These results further illustrate the advantages of FPGAs for deep learning inference on embedded systems even when transfer learning is used for an efficient end-to-end deployment process. This work advances current state-of-the-art with the implementation of a YOLOv4 object detection model developed with transfer learning for FPGA deployment.en_US
dc.identifier.urihttps://hdl.handle.net/1805/26436
dc.identifier.urihttp://dx.doi.org/10.7912/C2/62
dc.language.isoen_USen_US
dc.rightsAttribution-NoDerivatives 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by-nd/4.0/*
dc.subjectdeep learningen_US
dc.subjectcomputer visionen_US
dc.subjectobject detectionen_US
dc.subjectembedded systemsen_US
dc.subjectfpgaen_US
dc.subjecttransfer learningen_US
dc.titleA Transfer Learning Approach to Object Detection Acceleration for Embedded Applicationsen_US
dc.typeThesisen
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
A_Transfer_Learning_Approach_to_Object_Detection_Acceleration_for_Embedded_Applications.pdf
Size:
1.28 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.99 KB
Format:
Item-specific license agreed upon to submission
Description: