IU Indianapolis ScholarWorks :: Browsing by Subject "GPU"

Browsing by Subject "GPU"

Now showing 1 - 8 of 8

Application of Machine Learning to GPU Optimization, Deep Q-Networks and Computational Fluid Dynamics
(2025-05) Zigon, Robert J.; Song, Fengguang; Zhu, Luoding; Tuceryan, Mihran; Fang, Shiaofen
Throughout society today, machine learning has been catapulted to a transformative problem solving approach across various domains, ranging from natural language processing to computer vision to engineering optimization. The fundamental principle is the ability of algorithms to learn patterns and make decisions based on data, rather than relying on explicitly programmed instructions. This dissertation addresses the research question: “How can machine learning techniques be applied to improve computational efficiency and prediction accuracy in high-performance scientific computing tasks, including GPU kernel optimization, Deep Q-Networks, and computational fluid dynamics?” To answer the question, we devised three distinct problems, each of which is orthogonal to the next to represent a wide breadth of exploration. The problems focus on the two paradigms of supervised learning and reinforcement learning.
Deep Learning with Go
(2020-05) Stinson, Derek L.; Ben Miled, Zina; King, Brian; Rizkalla, Maher
Current research in deep learning is primarily focused on using Python as a support language. Go, an emerging language, that has many benefits including native support for concurrency has seen a rise in adoption over the past few years. However, this language is not widely used to develop learning models due to the lack of supporting libraries and frameworks for model development. In this thesis, the use of Go for the development of neural network models in general and convolution neural networks is explored. The proposed study is based on a Go-CUDA implementation of neural network models called GoCuNets. This implementation is then compared to a Go-CPU deep learning implementation that takes advantage of Go's built in concurrency called ConvNetGo. A comparison of these two implementations shows a significant performance gain when using GoCuNets compared to ConvNetGo.
GPU Accelerated Browser for Neuroimaging Genomics
(Springer, 2018-10) Zigon, Bob; Li, Huang; Yao, Xiaohui; Fang, Shiaofen; Hasan, Mohammad Al; Yan, Jingwen; Moore, Jason H.; Saykin, Andrew J.; Shen, Li; Alzheimer’s Disease Neuroimaging Initiative; Computer and Information Science, School of Science
Neuroimaging genomics is an emerging field that provides exciting opportunities to understand the genetic basis of brain structure and function. The unprecedented scale and complexity of the imaging and genomics data, however, have presented critical computational bottlenecks. In this work we present our initial efforts towards building an interactive visual exploratory system for mining big data in neuroimaging genomics. A GPU accelerated browsing tool for neuroimaging genomics is created that implements the ANOVA algorithm for single nucleotide polymorphism (SNP) based analysis and the VEGAS algorithm for gene-based analysis, and executes them at interactive rates. The ANOVA algorithm is 110 times faster than the 4-core OpenMP version, while the VEGAS algorithm is 375 times faster than its 4-core OpenMP counter part. This approach lays a solid foundation for researchers to address the challenges of mining large-scale imaging genomics datasets via interactive visual exploration.
GPU-OSDDA: A Bit-Vector GPU-based Deadlock Detection Algorithm for Single-Unit Resource Systems
(Taylor & Francis, 2015-09) Abell, Stephen; Nhan, Do; Lee, John J.; Department of Electrical and Computer Engineering
This article presents a GPU-based single-unit deadlock detection methodology and its algorithm, GPU-OSDDA. Our GPU-based design utilizes parallel hardware of GPU to perform computations and thus is able to overcome the major limitation of prior hardware-based approaches by having the capability of handling thousands of processes and resources, whilst achieving real-world run-times. By utilizing a bit-vector technique for storing algorithm ma- trices and designing novel, efficient algorithmic methods, we not only reduce memory usage dramatically but also achieve two orders of magnitude speedup over CPU equivalents. Additionally, GPU-OSDDA acts as an interactive service to the CPU, because all of the aforementioned computations and matrix management techniques take place on the GPU, requiring minimal interaction with the CPU. GPU-OSDDA is implemented on three GPU cards: Tesla C2050, Tesla K20c, and Titan X. Our design shows overall speedups of 6-595X over CPU equivalents.
Parallel acceleration of deadlock detection and avoidance algorithms on GPUs
(2013-08) Abell, Stephen W.; Lee, Jaehwan John; King, Brian; Chien, Stanley
Current mainstream computing systems have become increasingly complex. Most of which have Central Processing Units (CPUs) that invoke multiple threads for their computing tasks. The growing issue with these systems is resource contention and with resource contention comes the risk of encountering a deadlock status in the system. Various software and hardware approaches exist that implement deadlock detection/avoidance techniques; however, they lack either the speed or problem size capability needed for real-time systems. The research conducted for this thesis aims to resolve issues present in past approaches by converging the two platforms (software and hardware) by means of the Graphics Processing Unit (GPU). Presented in this thesis are two GPU-based deadlock detection algorithms and one GPU-based deadlock avoidance algorithm. These GPU-based algorithms are: (i) GPU-OSDDA: A GPU-based Single Unit Resource Deadlock Detection Algorithm, (ii) GPU-LMDDA: A GPU-based Multi-Unit Resource Deadlock Detection Algorithm, and (iii) GPU-PBA: A GPU-based Deadlock Avoidance Algorithm. Both GPU-OSDDA and GPU-LMDDA utilize the Resource Allocation Graph (RAG) to represent resource allocation status in the system. However, the RAG is represented using integer-length bit-vectors. The advantages brought forth by this approach are plenty: (i) less memory required for algorithm matrices, (ii) 32 computations performed per instruction (in most cases), and (iii) allows our algorithms to handle large numbers of processes and resources. The deadlock detection algorithms also require minimal interaction with the CPU by implementing matrix storage and algorithm computations on the GPU, thus providing an interactive service type of behavior. As a result of this approach, both algorithms were able to achieve speedups over two orders of magnitude higher than their serial CPU implementations (3.17-317.42x for GPU-OSDDA and 37.17-812.50x for GPU-LMDDA). Lastly, GPU-PBA is the first parallel deadlock avoidance algorithm implemented on the GPU. While it does not achieve two orders of magnitude speedup over its CPU implementation, it does provide a platform for future deadlock avoidance research for the GPU.
Parallel Processing For Adaptive Optics Optical Coherence Tomography (AO-OCT) Image Registration Using GPU
(2016-07-08) Do, Nhan Hieu; Lee, John Jaehwan; Miller, Donald T.; King, Brian; Salama, Paul
Adaptive Optics Optical Coherence Tomography (AO-OCT) is a high-speed, high-resolution ophthalmic imaging technique offering detailed 3D analysis of retina structure in vivo. However, AO-OCT volume images are sensitive to involuntary eye movements that occur even during steady fixation and include tremor, drifts, and micro-saccades. To correct eye motion artifacts within a volume and to stabilize a sequence of volumes acquired of the same retina area, we propose a stripe-wise 3D image registration algorithm with phase correlation. In addition, using several ideas such as coarse-to-fine approach, spike noise filtering, pre-computation caching, and parallel processing on a GPU, our approach can register a volume of size 512 x 512 x 512 in less than 6 seconds, which is a 33x speedup as compared to an equivalent CPU version in MATLAB. Moreover, our 3D registration approach is reliable even in the presence of large motions (micro-saccades) that distort the volumes. Such motion was an obstacle for a previous en face approach based on 2D projected images. The thesis also investigates GPU implementations for 3D phase correlation and 2D normalized cross-correlation, which could be useful for other image processing algorithms.
Real-time adaptive-optics optical coherence tomography (AOOCT) image reconstruction on a GPU
(2014) Shafer, Brandon Andrew; Eberhart, Russell C.; Salama, Paul; Christopher, Lauren; Lee, Jaehwan (John); King, Brian
Adaptive-optics optical coherence tomography (AOOCT) is a technology that has been rapidly advancing in recent years and offers amazing capabilities in scanning the human eye in vivo. In order to bring the ultra-high resolution capabilities to clinical use, however, newer technology needs to be used in the image reconstruction process. General purpose computation on graphics processing units is one such way that this computationally intensive reconstruction can be performed in a desktop computer in real-time. This work shows the process of AOOCT image reconstruction, the basics of how to use NVIDIA's CUDA to write parallel code, and a new AOOCT image reconstruction technology implemented using NVIDIA's CUDA. The results of this work demonstrate that image reconstruction can be done in real-time with high accuracy using a GPU.
A scalable approach to processing adaptive optics optical coherence tomography data from multiple sensors using multiple graphics processing units
(2014-12) Kriske, Jeffery Edward, Jr.; Song, Fengguang; Lee, Jaehwan; Raje, Rajeev
Adaptive optics-optical coherence tomography (AO-OCT) is a non-invasive method of imaging the human retina in vivo. It can be used to visualize microscopic structures, making it incredibly useful for the early detection and diagnosis of retinal disease. The research group at Indiana University has a novel multi-camera AO-OCT system capable of 1 MHz acquisition rates. Until this point, a method has not existed to process data from such a novel system quickly and accurately enough on a CPU, a GPU, or one that can scale to multiple GPUs automatically in an efficient manner. This is a barrier to using a MHz AO-OCT system in a clinical environment. A novel approach to processing AO-OCT data from the unique multi-camera optics system is tested on multiple graphics processing units (GPUs) in parallel with one, two, and four camera combinations. The design and results demonstrate a scalable, reusable, extensible method of computing AO-OCT output. This approach can either achieve real time results with an AO-OCT system capable of 1 MHz acquisition rates or be scaled to a higher accuracy mode with a fast Fourier transform of 16,384 complex values.

Browsing by Subject "GPU"

Results Per Page

Sort Options