CyberWater: An Open Framework for Data and Model Integration

Date
2024-05
Language
American English
Embargo Lift Date
Department
Committee Chair
Degree
Ph.D.
Degree Year
2024
Department
Computer & Information Science
Grantor
Purdue University
Journal Title
Journal ISSN
Volume Title
Found At
Can't use the file because of accessibility barriers? Contact us with the title of the item, permanent link, and specifics of your accommodation need.
Abstract

Workflow management systems (WMSs) are commonly used to organize/automate sequences of tasks as workflows to accelerate scientific discoveries. During complex workflow modeling, a local interactive workflow environment is desirable, as users usually rely on their rich, local environments for fast prototyping and refinements before they consider using more powerful computing resources.

This dissertation delves into the innovative development of the CyberWater framework based on Workflow Management Systems (WMSs). Against the backdrop of data-intensive and complex models, CyberWater exemplifies the transition of intricate data into insightful and actionable knowledge and introduces the nuanced architecture of CyberWater, particularly focusing on its adaptation and enhancement from the VisTrails system. It highlights the significance of control and data flow mechanisms and the introduction of new data formats for effective data processing within the CyberWater framework.

This study presents an in-depth analysis of the design and implementation of Generic Model Agent Toolkits. The discussion centers on template-based component mechanisms and the integration with popular platforms, while emphasizing the toolkits ability to facilitate on-demand access to High-Performance Computing resources for large-scale data handling. Besides, the development of an asynchronously controlled workflow within CyberWater is also explored. This innovative approach enhances computational performance by optimizing pipeline-level parallelism and allows for on-demand submissions of HPC jobs, significantly improving the efficiency of data processing.

A comprehensive methodology for model-driven development and Python code integration within the CyberWater framework and innovative applications of GPT models for automated data retrieval are introduced in this research as well. It examines the implementation of Git Actions for system automation in data retrieval processes and discusses the transformation of raw data into a compatible format, enhancing the adaptability and reliability of the data retrieval component in the adaptive generic model agent toolkit component. For the development and maintenance of software within the CyberWater framework, the use of tools like GitHub for version control and outlining automated processes has been applied for software updates and error reporting. Except that, the user data collection also emphasizes the role of the CyberWater Server in these processes.

In conclusion, this dissertation presents our comprehensive work on the CyberWater framework’s advancements, setting new standards in scientific workflow management and demonstrating how technological innovation can significantly elevate the process of scientific discovery.

Description
IUPUI
item.page.description.tableofcontents
item.page.relation.haspart
Cite As
ISSN
Publisher
Series/Report
Sponsorship
Major
Extent
Identifier
Relation
Journal
Source
Alternative Title
Type
Thesis
Number
Volume
Conference Dates
Conference Host
Conference Location
Conference Name
Conference Panel
Conference Secretariat Location
Version
Full Text Available at
This item is under embargo {{howLong}}