- Browse by Author
Browsing by Author "Fu, Yuankun"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
Item Accelerating complex modeling workflows in CyberWater using on-demand HPC/Cloud resources(IEEE, 2021-09) Li, Feng; Chen, Ranran; Fu, Yuankun; Song, Fengguang; Liang, Yao; Ranawaka, Isuru; Pamidighantam, Sudhakar; Luna, Daniel; Liang, Xu; Computer Information and Graphics Technology, School of Engineering and TechnologyWorkflow management systems (WMSs) are commonly used to organize/automate sequences of tasks as workflows to accelerate scientific discoveries. During complex workflow modeling, a local interactive workflow environment is desirable, as users usually rely on their rich, local environments for fast prototyping and refinements before they consider using more powerful computing resources. However, existing WMSs do not simultaneously support local interactive workflow environments and HPC resources. In this paper, we present an on-demand access mechanism to remote HPC resources from desktop/laptop-based workflow management software to compose, monitor and analyze scientific workflows in the CyberWater project. Cyber-Water is an open-data and open-modeling software framework for environmental and water communities. In this work, we extend the open-model, open-data design of CyberWater with on-demand HPC accessing capacity. In particular, we design and implement the LaunchAgent library, which can be integrated into the local desktop environment to allow on-demand usage of remote resources for hydrology-related workflows. LaunchAgent manages authentication to remote resources, prepares the computationally-intensive or data-intensive tasks as batch jobs, submits jobs to remote resources, and monitors the quality of services for the users. LaunchAgent interacts seamlessly with other existing components in CyberWater, which is now able to provide advantages of both feature-rich desktop software experience and increased computation power through on-demand HPC/Cloud usage. In our evaluations, we demonstrate how a hydrology workflow that consists of both local and remote tasks can be constructed and show that the added on-demand HPC/Cloud usage helps speeding up hydrology workflows while allowing intuitive workflow configurations and execution using a desktop graphical user interface.Item Designing a Parallel Memory-Aware Lattice Boltzmann Algorithm on Manycore Systems(IEEE, 2018-09) Fu, Yuankun; Li, Feng; Song, Fengguang; Zhu, Luoding; Computer and Information Science, School of ScienceLattice Boltzmann method (LBM) is an important computational fluid dynamics (CFD) approach to solving the Naiver-Stokes equations and simulating complex fluid flows. LBM is also well known as a memory bound problem and its performance is limited by the memory access time on modern computer systems. In this paper, we design and develop both sequential and parallel memory-aware algorithms to optimize the performance of LBM. The new memory-aware algorithms can enhance data reuses across multiple time steps to further improve the performance of the original and fused LBM. We theoretically analyze the algorithms to provide an insight into how data reuses occur in each algorithm. Finally, we conduct experiments and detailed performance analysis on two different manycore systems. Based on the experimental results, the parallel memory-aware LBM algorithm can outperform the fused LBM by up to 292% on the Intel Haswell system when using 28 cores, and by 302 % on the Intel Skylake system when using 48 cores.Item Modeling and Implementation of an Asynchronous Approach to Integrating HPC and Big Data Analysis(Elsevier, 2016-06) Fu, Yuankun; Song, Fengguang; Zhu, Luoding; Department of Computer & Information Science, School of ScienceWith the emergence of exascale computing and big data analytics, many important scientific applications require the integration of computationally intensive modeling and simulation with data-intensive analysis to accelerate scientific discovery. In this paper, we create an analytical model to steer the optimization of the end-to-end time-to-solution for the integrated computation and data analysis. We also design and develop an intelligent data broker to efficiently intertwine the computation stage and the analysis stage to practically achieve the optimal time-to-solution predicted by the analytical model. We perform experiments on both synthetic applications and real-world computational fluid dynamics (CFD) applications. The experiments show that the analytic model exhibits an average relative error of less than 10%, and the application performance can be improved by up to 131% for the synthetic programs and by up to 78% for the real-world CFD application.Item Performance analysis and optimization of in-situ integration of simulation with data analysis: zipping applications up(ACM, 2018-06) Fu, Yuankun; Li, Feng; Song, Fengguang; Chen, Zizhong; Computer and Information Science, School of ScienceThis paper targets an important class of applications that requires combining HPC simulations with data analysis for online or real-time scientific discovery. We use the state-of-the-art parallel-IO and data-staging libraries to build simulation-time data analysis workflows, and conduct performance analysis with real-world applications of computational fluid dynamics (CFD) simulations and molecular dynamics (MD) simulations. Driven by in-depth performance inefficiency analysis, we design an end-to-end application-level approach to eliminating the interlocks and synchronizations existent in the present methods. Our new approach employs both task parallelism and pipeline parallelism to reduce synchronizations effectively. In addition, we design a fully asynchronous, fine-grain, and pipelining runtime system, which is named Zipper. Zipper is a multi-threaded distributed runtime system and executes in a layer below the simulation and analysis applications. To further reduce the simulation application's stall time and enhance the data transfer performance, we design a concurrent data transfer optimization that uses both HPC network and parallel file system for improved bandwidth. The scalability of the Zipper system has been verified by a performance model and various empirical large scale experiments. The experimental results on an Intel multicore cluster as well as a Knight Landing HPC system demonstrate that the Zipper based approach can outperform the fastest state-of-the-art I/O transport library by up to 220% using 13,056 processor cores.