Skip to content

Commit a2ffbf6

Browse files
authored
Update the comparison to other tools. (#736)
1 parent 69daffc commit a2ffbf6

File tree

2 files changed

+75
-86
lines changed

2 files changed

+75
-86
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@ releases are available on [PyPI](https://pypi.org/project/pytask) and
1111
default pickle protocol.
1212
- {pull}`???` adapts the interactive debugger integration to Python 3.14's
1313
updated `pdb` behaviour and keeps pytest-style capturing intact.
14+
- {pull}`???` updates the comparison to other tools documentation and adds a section on
15+
the Common Workflow Language (CWL) and WorkflowHub.
1416

1517
## 0.5.7 - 2025-11-22
1618

docs/source/explanations/comparison_to_other_tools.md

Lines changed: 73 additions & 86 deletions
Original file line numberDiff line numberDiff line change
@@ -10,124 +10,111 @@ in other WMFs.
1010

1111
## [snakemake](https://github.com/snakemake/snakemake)
1212

13-
Pros
14-
15-
- Very mature library and probably the most adapted library in the realm of scientific
16-
workflow software.
17-
- Can scale to clusters and use Docker images.
18-
- Supports Python and R.
19-
- Automatic test case generation.
20-
21-
Cons
22-
23-
- Need to learn snakemake's syntax which is a mixture of Make and Python.
24-
- No debug mode.
25-
- Seems to have no plugin system.
13+
Snakemake is one of the most widely adopted workflow systems in scientific computing. It
14+
scales from local execution to clusters and cloud environments, with built-in support
15+
for containers and conda environments. Workflows are defined using a DSL that combines
16+
Make-style rules with Python, and can be exported to CWL for portability.
2617

2718
## [ploomber](https://github.com/ploomber/ploomber)
2819

29-
General
30-
31-
- Strong focus on machine learning pipelines, training, and deployment.
32-
- Integration with tools such as MLflow, Docker, AWS Batch.
33-
- Tasks can be defined in yaml, python files, Jupyter notebooks or SQL.
34-
35-
Pros
36-
37-
- Conversion from Jupyter notebooks to tasks via
38-
[soorgeon](https://github.com/ploomber/soorgeon).
39-
40-
Cons
41-
42-
- Programming in Jupyter notebooks increases the risk of coding errors (e.g.
43-
side-effects).
44-
- Supports parametrizations in form of cartesian products in `yaml` files, but not more
45-
powerful parametrizations.
20+
Ploomber focuses on machine learning pipelines with strong integration into MLflow,
21+
Docker, and AWS Batch. Tasks can be defined in YAML, Python files, Jupyter notebooks, or
22+
SQL, and it can convert notebooks into pipeline tasks.
4623

4724
## [Waf](https://waf.io)
4825

49-
Pros
50-
51-
- Mature library.
52-
- Can be extended.
53-
54-
Cons
55-
56-
- Focus on compiling binaries, not research projects.
57-
- Bus factor of 1.
26+
Waf is a mature build system primarily designed for compiling software projects. It
27+
handles complex build dependencies and can be extended with Python.
5828

5929
## [nextflow](https://github.com/nextflow-io/nextflow)
6030

61-
- Tasks are scripted using Groovy which is a superset of Java.
62-
- Supports AWS, Google, Azure.
63-
- Supports Docker, Shifter, Podman, etc.
31+
Nextflow is a workflow system popular in bioinformatics that runs on AWS, Google Cloud,
32+
and Azure. It uses Groovy (a JVM language) for scripting and has strong support for
33+
containers including Docker, Singularity, and Podman.
6434

6535
## [Kedro](https://github.com/kedro-org/kedro)
6636

67-
Pros
68-
69-
- Mature library, used by some institutions and companies. Created inside McKinsey.
70-
- Provides the full package: templates, pipelines, deployment
37+
Kedro is a mature workflow framework developed at McKinsey that provides project
38+
templates, data catalogs, and deployment tooling. It is designed for production machine
39+
learning pipelines with a focus on software engineering best practices.
7140

7241
## [pydoit](https://github.com/pydoit/doit)
7342

74-
General
75-
76-
- A general task runner which focuses on command line tools.
77-
- You can think of it as an replacement for make.
78-
- Powers Nikola, a static site generator.
43+
pydoit is a general-purpose task runner that serves as a Python replacement for Make. It
44+
focuses on executing command-line tools and powers projects like Nikola, a static site
45+
generator.
7946

8047
## [Luigi](https://github.com/spotify/luigi)
8148

82-
General
83-
84-
- A build system written by Spotify.
85-
- Designed for any kind of long-running batch processes.
86-
- Integrates with many other tools like databases, Hadoop, Spark, etc..
87-
88-
Cons
89-
90-
- Very complex interface and a lot of stuff you probably don't need.
91-
- [Development](https://github.com/spotify/luigi/graphs/contributors) seems to stall.
49+
Luigi is a workflow system built by Spotify for long-running batch processes. It
50+
integrates with Hadoop, Spark, and various databases for large-scale data pipelines.
51+
Development has slowed in recent years.
9252

9353
## [sciluigi](https://github.com/pharmbio/sciluigi)
9454

95-
sciluigi aims to be a lightweight wrapper around luigi.
96-
97-
Cons
98-
99-
- [Development](https://github.com/pharmbio/sciluigi/graphs/contributors) has basically
100-
stalled since 2018.
101-
- Not very popular compared to its lifetime.
55+
sciluigi is a lightweight wrapper around Luigi aimed at simplifying scientific workflow
56+
development. It reduces some of Luigi's boilerplate for research use cases. Development
57+
has stalled since 2018.
10258

10359
## [scipipe](https://github.com/scipipe/scipipe)
10460

105-
Cons
61+
SciPipe is a workflow library written in Go for building robust, flexible pipelines
62+
using Flow-Based Programming principles. It compiles workflows to fast binaries and is
63+
designed for bioinformatics and cheminformatics applications involving command-line
64+
tools.
10665

107-
- [Development](https://github.com/scipipe/scipipe/graphs/contributors) slowed down.
108-
- Written in Go.
66+
## [SCons](https://github.com/SCons/scons)
10967

110-
## [Scons](https://github.com/SCons/scons)
111-
112-
Pros
113-
114-
- Mature library.
115-
116-
Cons
117-
118-
- Seems to have no plugin system.
68+
SCons is a mature, cross-platform software construction tool that serves as an improved
69+
substitute for Make. It uses Python scripts for configuration and has built-in support
70+
for C, C++, Java, Fortran, and automatic dependency analysis.
11971

12072
## [pypyr](https://github.com/pypyr/pypyr)
12173

122-
General
74+
pypyr is a task-runner for automation pipelines defined in YAML. It provides built-in
75+
steps for common operations like loops, conditionals, retries, and error handling
76+
without requiring custom code, and is often used for CI/CD and DevOps automation.
77+
78+
## [ZenML](https://github.com/zenml-io/zenml)
12379

124-
- A general task-runner with task defined in yaml files.
80+
ZenML is an MLOps framework for building portable ML pipelines that can run on various
81+
orchestrators including Kubernetes, AWS SageMaker, GCP Vertex AI, Kubeflow, and Airflow.
82+
It focuses on productionizing ML workflows with features like automatic
83+
containerization, artifact tracking, and native caching.
12584

126-
## [zenml](https://github.com/zenml-io/zenml)
85+
## [Flyte](https://github.com/flyteorg/flyte)
12786

128-
## [flyte](https://github.com/flyteorg/flyte)
87+
Flyte is a Kubernetes-native workflow orchestration platform for building
88+
production-grade data and ML pipelines. It provides automatic retries, checkpointing,
89+
failure recovery, and scales dynamically across cloud providers including AWS, GCP, and
90+
Azure.
12991

13092
## [pipefunc](https://github.com/pipefunc/pipefunc)
13193

132-
A tool for executing graphs made out of functions. More focused on computational
133-
compared to workflow graphs.
94+
pipefunc is a lightweight library for creating function pipelines as directed acyclic
95+
graphs (DAGs) in pure Python. It automatically handles execution order, supports
96+
map-reduce operations, parallel execution, and provides resource profiling.
97+
98+
## [Common Workflow Language (CWL)](https://www.commonwl.org/)
99+
100+
CWL is an open standard for describing data analysis workflows in a portable,
101+
language-agnostic format. Its primary goal is to enable workflows to be written once and
102+
executed across different computing environments—from local workstations to clusters,
103+
cloud, and HPC systems—without modification. Workflows described in CWL can be
104+
registered on [WorkflowHub](https://workflowhub.eu/) for sharing and discovery following
105+
FAIR (Findable, Accessible, Interoperable, Reusable) principles.
106+
107+
CWL is particularly prevalent in bioinformatics and life sciences where reproducibility
108+
across institutions is critical. Tools that support CWL include
109+
[cwltool](https://github.com/common-workflow-language/cwltool) (the reference
110+
implementation), [Toil](https://github.com/DataBiosphere/toil),
111+
[Arvados](https://arvados.org/), and [REANA](https://reanahub.io/). Some workflow
112+
systems like Snakemake and Nextflow can export workflows to CWL format.
113+
114+
pytask is not a CWL-compliant tool because it operates on a fundamentally different
115+
model. CWL describes workflows as graphs of command-line tool invocations where data
116+
flows between tools via files. pytask, in contrast, orchestrates Python functions that
117+
can execute arbitrary code, manipulate data in memory, call APIs, or perform any
118+
operation available in Python. This Python-native approach enables features like
119+
interactive debugging but means pytask workflows cannot be represented in CWL's
120+
command-line-centric specification.

0 commit comments

Comments
 (0)