- Browse by Subject
Browsing by Subject "embedded systems"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
Item CORBA-JS: An Open-Standards Framework for Distributed Object Computing over the Web(Office of the Vice Chancellor for Research, 2013-04-05) Parulekar, Tejal B.; Feiock, Dennis C.; Hill, James H.Distributed object computing (DOC) is a well-established software engineering paradigm for implementing distributed real-time and embedded (DRE) systems, such as real-time monitoring systems. Likewise, CORBA is a well-established DOC open-standard used in DRE systems. Due to many technological limitations, DOC was traditionally unavailable in Web-based applications (i.e., stateful applications that communicate over HTTP, and are accessible via a Web browser) without the use of proprietary, custom technologies. The problem with using proprietary, custom technology is it creates fragmentation in the solution space where some solutions are not available to all end-users (e.g., Web sites that only work within a certain Web browser because of the used technology). With the advent of HTML5 and WebSockets, which is an open-standard for enabling two-way communication over HTTP, DOC now has the necessary technological foundations to be realized within Web applications without the use of proprietary, custom technologies. To date, however, no researchers have attempted to apply DOC over HTTP using well-established DOC open-standards, such as CORBA. This research therefore is an initial investigation into implementing CORBA atop of HTML5 and WebSockets. As part of this research, we are investigating the challenges in realizing the solution, and proposing ways to improve the target programming languages and CORBA specification. Doing so will enable developers to create feature-rich real-time Web applications that improve upon current state-of-the-art approaches, e.g., Asynchronous XML and JavaScript (AJAX), that are resource intensive (e.g., use a lot of CPU, network bandwidth, and memory) and hard to program.Item Deployment of SE-SqueezeNext on NXP BlueBox 2.0 and NXP i.MX RT1060 MCU(IEEE, 2020-08) Chappa, Ravi Teja N. V. S.; El-Sharkawy, Mohamed; Electrical and Computer Engineering, School of Engineering and TechnologyConvolution neural system is being utilized in field of self-governing driving vehicles or driver assistance systems (ADAS), and has made extraordinary progress. Before the CNN, conventional AI calculations helped ADAS. Right now, there is an incredible investigation being done in DNNs like MobileNet, SqueezeNext & SqueezeNet. It improved the CNN designs and made it increasingly appropriate to actualize on real-time embedded systems. Due to the model size complexity of many models, they cannot be deployed straight away on real-time systems. The most important requirement will be to have less model size without a tradeoff with accuracy. Squeeze-and-Excitation SqueezeNext which is an efficient DNN with best model accuracy of 92.60% and with least model size of 0.595MB is chosen to be deployed on NXP BlueBox 2.0 and NXP i.MX RT1060. This deployment is very successful because of its less size and better accuracy. The model is trained and validated on CIFAR-10 dataset.Item A Transfer Learning Approach to Object Detection Acceleration for Embedded Applications(2021-08) Vance, Lauren M.; Christopher, Lauren; King, Brian; Rizkalla, MaherDeep learning solutions to computer vision tasks have revolutionized many industries in recent years, but embedded systems have too many restrictions to take advantage of current state-of-the-art configurations. Typical embedded processor hardware configurations must meet very low power and memory constraints to maintain small and lightweight packaging, and the architectures of the current best deep learning models are too computationally-intensive for these hardware configurations. Current research shows that convolutional neural networks (CNNs) can be deployed with a few architectural modifications on Field-Programmable Gate Arrays (FPGAs) resulting in minimal loss of accuracy, similar or decreased processing speeds, and lower power consumption when compared to general-purpose Central Processing Units (CPUs) and Graphics Processing Units (GPUs). This research contributes further to these findings with the FPGA implementation of a YOLOv4 object detection model that was developed with the use of transfer learning. The transfer-learned model uses the weights of a model pre-trained on the MS-COCO dataset as a starting point then fine-tunes only the output layers for detection on more specific objects of five classes. The model architecture was then modified slightly for compatibility with the FPGA hardware using techniques such as weight quantization and replacing unsupported activation layer types. The model was deployed on three different hardware setups (CPU, GPU, FPGA) for inference on a test set of 100 images. It was found that the FPGA was able to achieve real-time inference speeds of 33.77 frames-per-second, a speedup of 7.74 frames-per-second when compared to GPU deployment. The model also consumed 96% less power than a GPU configuration with only approximately 4% average loss in accuracy across all 5 classes. The results are even more striking when compared to CPU deployment, with 131.7-times speedup in inference throughput. CPUs have long since been outperformed by GPUs for deep learning applications but are used in most embedded systems. These results further illustrate the advantages of FPGAs for deep learning inference on embedded systems even when transfer learning is used for an efficient end-to-end deployment process. This work advances current state-of-the-art with the implementation of a YOLOv4 object detection model developed with transfer learning for FPGA deployment.