- Browse by Author
Browsing by Author "King, Brian S."
Now showing 1 - 10 of 17
Results Per Page
Sort Options
Item AI on the Edge with CondenseNeXt: An Efficient Deep Neural Network for Devices with Constrained Computational Resources(2021-08) Kalgaonkar, Priyank B.; El-Sharkawy, Mohamed A.; King, Brian S.; Rizkalla, Maher E.Research work presented within this thesis propose a neoteric variant of deep convolutional neural network architecture, CondenseNeXt, designed specifically for ARM-based embedded computing platforms with constrained computational resources. CondenseNeXt is an improved version of CondenseNet, the baseline architecture whose roots can be traced back to ResNet. CondeseNeXt replaces group convolutions in CondenseNet with depthwise separable convolutions and introduces group-wise pruning, a model compression technique, to prune (remove) redundant and insignificant elements that either are irrelevant or do not affect performance of the network upon disposition. Cardinality, a new dimension to the existing spatial dimensions, and class-balanced focal loss function, a weighting factor inversely proportional to the number of samples, has been incorporated in order to relieve the harsh effects of pruning, into the design of CondenseNeXt’s algorithm. Furthermore, extensive analyses of this novel CNN architecture was performed on three benchmarking image datasets: CIFAR-10, CIFAR-100 and ImageNet by deploying the trained weight on to an ARM-based embedded computing platform: NXP BlueBox 2.0, for real-time image classification. The outputs are observed in real-time in RTMaps Remote Studio’s console to verify the correctness of classes being predicted. CondenseNeXt achieves state-of-the-art image classification performance on three benchmark datasets including CIFAR-10 (4.79% top-1 error), CIFAR-100 (21.98% top-1 error) and ImageNet (7.91% single model, single crop top-5 error), and up to 59.98% reduction in forward FLOPs compared to CondenseNet. CondenseNeXt can also achieve a final trained model size of 2.9 MB, however at the cost of 2.26% in accuracy loss. Thus, performing image classification on ARM-Based computing platforms without requiring a CUDA enabled GPU support, with outstanding efficiency.Item Complex Vehicle Modeling: A Data Driven Approach(2019-12) Schoen, Alexander C.; Ben Miled, Zina; Dos Santos, Euzeli C.; King, Brian S.This thesis proposes an artificial neural network (NN) model to predict fuel consumption in heavy vehicles. The model uses predictors derived from vehicle speed, mass, and road grade. These variables are readily available from telematics devices that are becoming an integral part of connected vehicles. The model predictors are aggregated over a fixed distance traveled (i.e., window) instead of fixed time interval. It was found that 1km windows is most appropriate for the vocations studied in this thesis. Two vocations were studied, refuse and delivery trucks. The proposed NN model was compared to two traditional models. The first is a parametric model similar to one found in the literature. The second is a linear regression model that uses the same features developed for the NN model. The confidence level of the models using these three methods were calculated in order to evaluate the models variances. It was found that the NN models produce lower point-wise error. However, the stability of the models are not as high as regression models. In order to improve the variance of the NN models, an ensemble based on the average of 5-fold models was created. Finally, the confidence level of each model is analyzed in order to understand how much error is expected from each model. The mean training error was used to correct the ensemble predictions for five K-Fold models. The ensemble K-fold model predictions are more reliable than the single NN and has lower confidence interval than both the parametric and regression models.Item Deep Image Processing with Spatial Adaptation and Boosted Efficiency & Supervision for Accurate Human Keypoint Detection and Movement Dynamics Tracking(2023-05) Dai, Chao Yang; Zhang, Qingxue; King, Brian S.; Fang, ShiaofenThis thesis aims to design and develop the spatial adaptation approach through spatial transformers to improve the accuracy of human keypoint recognition models. We have studied different model types and design choices to gain an accuracy increase over models without spatial transformers and analyzed how spatial transformers increase the accuracy of predictions. A neural network called Widenet has been leveraged as a specialized network for providing the parameters for the spatial transformer. Further, we have evaluated methods to reduce the model parameters, as well as the strategy to enhance the learning supervision for further improving the performance of the model. Our experiments and results have shown that the proposed deep learning framework can effectively detect the human key points, compared with the baseline methods. Also, we have reduced the model size without significantly impacting the performance, and the enhanced supervision has improved the performance. This study is expected to greatly advance the deep learning of human key points and movement dynamics.Item Deep Transferable Intelligence for Wearable Big Data Pattern Detection(2021-08) Gangadharan, Kiirthanaa; Zhang, Qingxue; King, Brian S.; Chien, Yung-Ping S.Biomechanical Big Data is of great significance to precision health applications, among which we take special interest in Physical Activity Detection (PAD). In this study, we have performed extensive research on deep learning-based PAD from biomechanical big data, focusing on the challenges raised by the need for real-time edge inference. First, considering there are many places we can place the motion sensors, we have thoroughly compared and analyzed the location difference in terms of deep learning-based PAD performance. We have further compared the difference among six sensor channels (3-axis accelerometer and 3-axis gyroscope). Second, we have selected the optimal sensor and the optimal sensor channel, which can not only provide sensor usage suggestions but also enable ultra-lowpower application on the edge. Third, we have investigated innovative methods to minimize the training effort of the deep learning model, leveraging the transfer learning strategy. More specifically, we propose to pre-train a transferable deep learning model using the data from other subjects and then fine-tune the model using limited data from the target-user. In such a way, we have found that, for single-channel case, the transfer learning can effectively increase the deep model performance even when the fine-tuning effort is very small. This research, demonstrated by comprehensive experimental evaluation, has shown the potential of ultra-low-power PAD with minimized sensor stream, and minimized training effort.Item Development of Automated Fault Recovery Controls for Plug-Flow Biomass Reactors(2024-05) Jacob, Mariam; Schubert, Peter J.; Li, Lingxi; King, Brian S.The demand for sustainable and renewable energy sources has prompted significant research and development efforts in the field of biomass gasification. Biomass gasification technology holds significant promise for sustainable energy production, offering a renewable alternative to fossil fuels while mitigating environmental impact. This thesis presents a detailed study on the design, development, and implementation of a Plug-Flow Reactor Biomass Gasifier integrated with an Automated Auger Jam Detection System and a Blower Algorithm to maintain constant reactor pressure by varying blower speed with respect to changes in reactor pressure. The system is based on indirectly- heated pyrolytic gasification technology and is developed using Simulink™. The proposed gasification system use the principles of pyrolysis and gasification to convert biomass feedstock into syngas efficiently. An innovative plug-flow reactor configuration ensures uniform heat distribution and residence time, optimizing gasification performance and product quality. Additionally, the system incorporates an automated auger jam detection system, which utilizes sensor data to detect and mitigate auger jams in real-time, thereby enhancing operational reliability and efficiency. By monitoring these parameters, the system detects deviations from normal operating conditions indicative of auger jams and initiates corrective actions automatically. The detection algorithm is trained using test cases and validated through detailed testing to ensure accurate and reliable performance. The MATLAB™-based implementation offers flexibility, scalability, and ease of integration with existing gasifier control systems. The graphical user interface (GUI) provides operators with real-time monitoring and visualization of system status, auger performance, and detected jam events. Additionally, the system generates alerts and notifications to inform operators of detected jams, enabling timely intervention and preventive maintenance. To maintain consistent gasification conditions, a blower algorithm is developed to regulate airflow and maintain constant reactor pressure within the gasifier. The blower algorithm dynamically adjusts blower speed based on feedback from differential pressure sensors, ensuring optimal gasification performance under varying operating conditions. The integration of the blower algorithm into the gasification system contributes to stable syngas production and improved process control. The development of the Plug-Flow Reactor Biomass Gasifier, Automated Auger Jam Detection System, and Blower Algorithm is accompanied by rigorous simulation studies and experimental validation. Overall, this thesis contributes to the advancement of biomass gasification technology by presenting a detailed study on a plug flow reactor biomass gasifier with indirectly- heated pyrolytic gasification technology with an Automated Auger Jam Detection System and Blower Algorithm. The findings offer valuable insights for researchers, engineers, policymakers, and industry stakeholders supporting the transition towards cleaner and more renewable energy systems.Item Dynamic electronic asset allocation comparing genetic algorithm with particle swarm optimization(2018-12) Islam, Md Saiful; Christopher, Lauren A.; King, Brian S.; El-Sharkawy, MohamedThe contribution of this research work can be divided into two main tasks: 1) implementing this Electronic Warfare Asset Allocation Problem (EWAAP) with the Genetic Algorithm (GA); 2) Comparing performance of Genetic Algorithm to Particle Swarm Optimization (PSO) algorithm. This research problem implemented Genetic Algorithm in C++ and used QT Data Visualization for displaying three-dimensional space, pheromone, and Terrain. The Genetic algorithm implementation maintained and preserved the coding style, data structure, and visualization from the PSO implementation. Although the Genetic Algorithm has higher fitness values and better global solutions for 3 or more receivers, it increases the running time. The Genetic Algorithm is around (15-30\%) more accurate for asset counts from 3 to 6 but requires (26-82\%) more computational time. When the allocation problem complexity increases by adding 3D space, pheromones and complex terrains, the accuracy of GA is 3.71\% better but the speed of GA is 121\% slower than PSO. In summary, the Genetic Algorithm gives a better global solution in some cases but the computational time is higher for the Genetic Algorithm with than Particle Swarm Optimization.Item A Dynamically Configurable Discrete Event Simulation Framework for Many-Core System-on-Chips(2010) Barnes, Christopher J.; Lee, Jaehwan John; King, Brian S.; Chien, Yung Ping StanleyIndustry trends indicate that many-core heterogeneous processors will be the next-generation answer to Moore's law and reduced power consumption. Thus, both academia and industry are focused on the challenges presented by many-core heterogeneous processor designs. In many cases, researchers use discrete event simulators to research and validate new computer architecture innovations. However, there is a lack of dynamically configurable discrete event simulation environments for the testing and development of many-core heterogeneous processors. To fulfill this need we present Mhetero, a retargetable framework for cycle-accurate simulation of heterogeneous many-core processors along with the cycle-accurate simulation of their associated network-on-chip communication infrastructure. Mhetero is the result of research into dynamically configurable and highly flexible simulation tools with which users are free to produce custom instruction sets and communication methods in a highly modular design environment. In this thesis we will discuss our approach to dynamically configurable discrete event simulation and present several experiments performed using the framework to exemplify how Mhetero, and similarly constructed simulators, may be used for future innovations.Item Enhancing Precision of object detectors: bridging classification and localization gaps for 2D and 3D models(2024-05) Ravi, Niranjan; El-Sharkawy, Mohamed; Rizkalla, Maher E.; Li, Lingxi; King, Brian S.Artificial Intelligence (AI) has revolutionized and accelerated significant advancements in various fields such as healthcare, finance, education, agriculture and the development of autonomous vehicles. We are rapidly approaching Level 5 Autonomy due to recent developments in autonomous technology, including self-driving cars, robot navigation, smart traffic monitoring systems, and dynamic routing. This success has been made possible due to Deep Learning technologies and advanced Computer Vision (CV) algorithms. With the help of perception sensors such as Camera, LiDAR and RADAR, CV algorithms enable a self-driving vehicle to interact with the environment and make intelligent decisions. Object detection lays the foundations for various applications, such as collision and obstacle avoidance, lane detection, pedestrian and vehicular safety, and object tracking. Object detection has two significant components: image classification and object localization. In recent years, enhancing the performance of 2D and 3D object detectors has spiked interest in the research community. This research aims to resolve the drawbacks associated with localization loss estimation of 2D and 3D object detectors by addressing the bounding box regression problem, addressing the class imbalance issue affecting the confidence loss estimation, and finally proposing a dynamic cross-model 3D hybrid object detector with enhanced localization and confidence loss estimation. This research aims to address challenges in object detectors through four key contributions. In the first part, we aim to address the problems associated with the image classification component of 2D object detectors. Class imbalance is a common problem associated with supervised training. Common causes are noisy data, a scene with a tiny object surrounded by background pixels, or a dense scene with too many objects. These scenarios can produce many negative samples compared to positive ones, affecting the network learning and reducing the overall performance. We examined these drawbacks and proposed an Enhanced Hard Negative Mining (EHNM) approach, which utilizes anchor boxes with 20% to 50% overlap and positive and negative samples to boost performance. The efficiency of the proposed EHNM was evaluated using Single Shot Multibox Detector (SSD) architecture on the PASCAL VOC dataset, indicating that the detection accuracy of tiny objects increased by 3.9% and 4% and the overall accuracy improved by 0.9%. To address localization loss, our second approach investigates drawbacks associated with existing bounding box regression problems, such as poor convergence and incorrect regression. We analyzed various cases, such as when objects are inclusive of one another, two objects with the same centres, two objects with the same centres and similar aspect ratios. During our analysis, we observed existing intersections over Union (IoU) loss and its variant’s failure to address them. We proposed two new loss functions, Improved Intersection Over Union (IIoU) and Balanced Intersection Over Union (BIoU), to enhance performance and minimize computational efforts. Two variants of the YOLOv5 model, YOLOv5n6 and YOLOv5s, were utilized to demonstrate the superior performance of IIoU on PASCAL VOC and CGMU datasets. With help of ROS and NVIDIA’s devices, inference speed was observed in real-time. Extensive experiments were performed to evaluate the performance of BIoU on object detectors. The evaluation results indicated MASK_RCNN network trained on the COCO dataset, YOLOv5n6 network trained on SKU-110K and YOLOv5x trained on the custom e-scooter dataset demonstrated 3.70% increase on small objects, 6.20% on 55% overlap and 9.03% on 80% overlap. In the earlier parts, we primarily focused on 2D object detectors. Owing to its success, we extended the scope of our research to 3D object detectors in the later parts. The third portion of our research aims to solve bounding box problems associated with 3D rotated objects. Existing axis-aligned loss functions suffer a performance gap if the objects are rotated. We enhanced the earlier proposed IIoU loss by considering two additional parameters: the objects’ Z-axis and rotation angle. These two parameters aid in localizing the object in 3D space. Evaluation was performed on LiDAR and Fusion methods on 3D KITTI and nuScenes datasets. Once we addressed the drawbacks associated with confidence and localization loss, we further explored ways to increase the performance of cross-model 3D object detectors. We discovered from previous studies that perception sensors are volatile to harsh environmental conditions, sunlight, and blurry motion. In the final portion of our research, we propose a hybrid 3D cross-model detection network (MAEGNN) equipped with MaskedAuto Encoders (MAE) and Graph Neural Networks (GNN) along with earlier proposed IIoU and ENHM. The performance evaluation on MAEGNN on the KITTI validation dataset and KITTI test set yielded a detection accuracy of 69.15%, 63.99%, 58.46% and 40.85%, 37.37% on 3D pedestrians with overlap of 50%. This developed hybrid detector overcomes the challenges of localization error and confidence estimation and outperforms many state-of-art 3D object detectors for autonomous platforms.Item High Performance GNRFET Devices for High-Speed Low-Power Analog and Digital Applications(2019-05) Patnala, Mounica; Rizkalla, Maher E.; King, Brian S.; Sharkawy, Mohamed El; Ytterdal, TrondRecent ULSI (ultra large scale integration) technology emphasizes small size devices, featuring low power and high switching speed. Moore's law has been followed successfully in scaling down the silicon device in order to enhance the level of integration with high performances until conventional devices failed to cop up with further scaling due to limitations with ballistic effects, and challenges with accommodating dopant fluctuation, mobility degradation, among other device parameters. Recently, Graphene based devices o ered alternative approach, featuring small size and high performances. This includes high carrier mobility, high carrier density, high robustness, and high thermal conductivity. These unique characteristics made the Graphene devices attractive for high speed electronic architectures. In this research, Graphene devices were integrated into applications with analog, digital, and mixed signals based systems. Graphene devices were briefly explored in electronics applications since its first model developed by the University of Illinois, Champaign in 2013. This study emphasizes the validation of the model in various applications with analog, digital, and mixed signals. At the analog level, the model was used for voltage and power amplifiers; classes A, B, and AB. At the digital level, the device model was validated within the universal gates, adders, multipliers, subtractors, multiplexers, demultiplexers, encoders, and comparators. The study was also extended to include Graphene devices for serializers, the digital systems incorporated into the data structure storage. At the mixed signal level, the device model was validated for the DACs/ADCs. In all components, the features of the new devices were emphasized as compared with the existing silicon technology. The system functionality and dynamic performances were also elaborated. The study also covered the linearity characteristics of the devices within full input range operation. GNRFETs with a minimum channel length of 10nm and an input voltage 0.7V were considered in the study. An electronic design platform ADS (Advanced Design Systems) was used in the simulations. The power amplifiers showed noise figure as low as 0.064dbs for class A, and 0.32 dbs for class B, and 0.69 dbs for class AB power amplifiers. The design was stable and as high as 5.12 for class A, 1.02 for class B, and 1.014 for class AB. The stability factor was estimated at 2GHz operation. The harmonics were as low as -100 dbs for class A, -60 dbs for class B, and -50dbs for class AB, all simulated at 1GHz. The device was incorporated into ADC system, and as low as 24.5 micro Watt power consumption and 40 nsec rise time were observed. Likewise, the DAC showed low power consumption as of 4.51 micro Watt. The serializer showed as minimum power consumption of the order of 0.4mW. These results showed that these nanoscale devices have potential future for high-speed communication systems, medical devices, computer architecture and dynamic Nano electromechanical (NEMS) which provides ultra-level of integration, incorporating embedded and IoT devices supporting this technology. Results of analog and digital components showed superiority over other silicon transistor technologies in their ultra-low power consumption and high switching speed.Item Increasing CNN representational power using absolute cosine value regularization(2020-05) Singleton, William S.; El-Sharkawy, Mohamed A.; King, Brian S.; Kim, Dongsoo S.The Convolutional Neural Network (CNN) is a mathematical model designed to distill input information into a more useful representation. This distillation process removes information over time through a series of dimensionality reductions, which ultimately, grant the model the ability to resist noise, and generalize effectively. However, CNNs often contain elements that are ineffective at contributing towards useful representations. This Thesis aims at providing a remedy for this problem by introducing Absolute Cosine Value Regularization (ACVR). This is a regularization technique hypothesized to increase the representational power of CNNs by using a Gradient Descent Orthogonalization algorithm to force the vectors that constitute their filters at any given convolutional layer to occupy unique positions in in their respective spaces. This method should in theory, lead to a more effective balance between information loss and representational power, ultimately, increasing network performance. The following Thesis proposes and examines the mathematics and intuition behind ACVR, and goes on to propose Dynamic-ACVR (D-ACVR). This Thesis also proposes and examines the effects of ACVR on the filters of a low-dimensional CNN, as well as the effects of ACVR and D-ACVR on traditional Convolutional filters in VGG-19. Finally, this Thesis proposes and examines regularization of the Pointwise filters in MobileNetv1.