CrossFacilityWorkflows

Workflow Tool Decider

This tool is designed to help you choose the best workflow tool for your ASCR facilities workloads.

GNU Parallel
GNU Parallel

GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU parallel can then split the input and pipe it into commands in parallel.

Parsl
Parsl

Parsl extends parallelism in Python beyond a single computer. You can use Parsl just like Python's parallel executors but across multiple cores and nodes. However, the real power of Parsl is in expressing multi-step workflows of functions. Parsl lets you chain functions together and will launch each function as inputs and computing resources are available.

FireWorks
FireWorks

FireWorks is a free, open-source code for defining, managing, and executing workflows. Complex workflows can be defined using Python, JSON, or YAML, are stored using MongoDB, and can be monitored through a built-in web interface. FireWorks has dynamic workflows, failure-detection routines, and built-in tools and execution modes for running high-throughput computations at large computing centers.

Balsam
Balsam

Balsam is a unified platform to manage high-throughput workflows across the HPC landscape. It was developed by the ALCF Data Science group, to optimize workflow execution on ALCF systems (and elsewhere).

TaskFarmer
TaskFarmer

TaskFarmer is a workflow manager developed in-house at NERSC to coordinate single or multicore tasks. It tracks which tasks have completed successfully, and allows straightforward re-submission of failed or un-run jobs from a task list.

Snakemake
Snakemake

The Snakemake workflow management system is a tool to create reproducible and scalable data analyses. Workflows are described via a human readable, Python based language. They can be seamlessly scaled to server, cluster, grid and cloud environments, without the need to modify the workflow definition. Finally, Snakemake workflows can entail a description of required software, which will be automatically deployed to any execution environment.

Python API Complex DAG Single node MPI High throughput Data transfer Persistence of Data Good Documentation
Name
GNU Parallel x x x
Parsl x x x x
FireWorks x x x
Balsam x x x
TaskFarmer x
Snakemake x x x x x