@@ -10,124 +10,111 @@ in other WMFs.
1010
1111## [ snakemake] ( https://github.com/snakemake/snakemake )
1212
13- Pros
14-
15- - Very mature library and probably the most adapted library in the realm of scientific
16- workflow software.
17- - Can scale to clusters and use Docker images.
18- - Supports Python and R.
19- - Automatic test case generation.
20-
21- Cons
22-
23- - Need to learn snakemake's syntax which is a mixture of Make and Python.
24- - No debug mode.
25- - Seems to have no plugin system.
13+ Snakemake is one of the most widely adopted workflow systems in scientific computing. It
14+ scales from local execution to clusters and cloud environments, with built-in support
15+ for containers and conda environments. Workflows are defined using a DSL that combines
16+ Make-style rules with Python, and can be exported to CWL for portability.
2617
2718## [ ploomber] ( https://github.com/ploomber/ploomber )
2819
29- General
30-
31- - Strong focus on machine learning pipelines, training, and deployment.
32- - Integration with tools such as MLflow, Docker, AWS Batch.
33- - Tasks can be defined in yaml, python files, Jupyter notebooks or SQL.
34-
35- Pros
36-
37- - Conversion from Jupyter notebooks to tasks via
38- [ soorgeon] ( https://github.com/ploomber/soorgeon ) .
39-
40- Cons
41-
42- - Programming in Jupyter notebooks increases the risk of coding errors (e.g.
43- side-effects).
44- - Supports parametrizations in form of cartesian products in ` yaml ` files, but not more
45- powerful parametrizations.
20+ Ploomber focuses on machine learning pipelines with strong integration into MLflow,
21+ Docker, and AWS Batch. Tasks can be defined in YAML, Python files, Jupyter notebooks, or
22+ SQL, and it can convert notebooks into pipeline tasks.
4623
4724## [ Waf] ( https://waf.io )
4825
49- Pros
50-
51- - Mature library.
52- - Can be extended.
53-
54- Cons
55-
56- - Focus on compiling binaries, not research projects.
57- - Bus factor of 1.
26+ Waf is a mature build system primarily designed for compiling software projects. It
27+ handles complex build dependencies and can be extended with Python.
5828
5929## [ nextflow] ( https://github.com/nextflow-io/nextflow )
6030
61- - Tasks are scripted using Groovy which is a superset of Java.
62- - Supports AWS, Google, Azure.
63- - Supports Docker, Shifter, Podman, etc .
31+ Nextflow is a workflow system popular in bioinformatics that runs on AWS, Google Cloud,
32+ and Azure. It uses Groovy (a JVM language) for scripting and has strong support for
33+ containers including Docker, Singularity, and Podman .
6434
6535## [ Kedro] ( https://github.com/kedro-org/kedro )
6636
67- Pros
68-
69- - Mature library, used by some institutions and companies. Created inside McKinsey.
70- - Provides the full package: templates, pipelines, deployment
37+ Kedro is a mature workflow framework developed at McKinsey that provides project
38+ templates, data catalogs, and deployment tooling. It is designed for production machine
39+ learning pipelines with a focus on software engineering best practices.
7140
7241## [ pydoit] ( https://github.com/pydoit/doit )
7342
74- General
75-
76- - A general task runner which focuses on command line tools.
77- - You can think of it as an replacement for make.
78- - Powers Nikola, a static site generator.
43+ pydoit is a general-purpose task runner that serves as a Python replacement for Make. It
44+ focuses on executing command-line tools and powers projects like Nikola, a static site
45+ generator.
7946
8047## [ Luigi] ( https://github.com/spotify/luigi )
8148
82- General
83-
84- - A build system written by Spotify.
85- - Designed for any kind of long-running batch processes.
86- - Integrates with many other tools like databases, Hadoop, Spark, etc..
87-
88- Cons
89-
90- - Very complex interface and a lot of stuff you probably don't need.
91- - [ Development] ( https://github.com/spotify/luigi/graphs/contributors ) seems to stall.
49+ Luigi is a workflow system built by Spotify for long-running batch processes. It
50+ integrates with Hadoop, Spark, and various databases for large-scale data pipelines.
51+ Development has slowed in recent years.
9252
9353## [ sciluigi] ( https://github.com/pharmbio/sciluigi )
9454
95- sciluigi aims to be a lightweight wrapper around luigi.
96-
97- Cons
98-
99- - [ Development] ( https://github.com/pharmbio/sciluigi/graphs/contributors ) has basically
100- stalled since 2018.
101- - Not very popular compared to its lifetime.
55+ sciluigi is a lightweight wrapper around Luigi aimed at simplifying scientific workflow
56+ development. It reduces some of Luigi's boilerplate for research use cases. Development
57+ has stalled since 2018.
10258
10359## [ scipipe] ( https://github.com/scipipe/scipipe )
10460
105- Cons
61+ SciPipe is a workflow library written in Go for building robust, flexible pipelines
62+ using Flow-Based Programming principles. It compiles workflows to fast binaries and is
63+ designed for bioinformatics and cheminformatics applications involving command-line
64+ tools.
10665
107- - [ Development] ( https://github.com/scipipe/scipipe/graphs/contributors ) slowed down.
108- - Written in Go.
66+ ## [ SCons] ( https://github.com/SCons/scons )
10967
110- ## [ Scons] ( https://github.com/SCons/scons )
111-
112- Pros
113-
114- - Mature library.
115-
116- Cons
117-
118- - Seems to have no plugin system.
68+ SCons is a mature, cross-platform software construction tool that serves as an improved
69+ substitute for Make. It uses Python scripts for configuration and has built-in support
70+ for C, C++, Java, Fortran, and automatic dependency analysis.
11971
12072## [ pypyr] ( https://github.com/pypyr/pypyr )
12173
122- General
74+ pypyr is a task-runner for automation pipelines defined in YAML. It provides built-in
75+ steps for common operations like loops, conditionals, retries, and error handling
76+ without requiring custom code, and is often used for CI/CD and DevOps automation.
77+
78+ ## [ ZenML] ( https://github.com/zenml-io/zenml )
12379
124- - A general task-runner with task defined in yaml files.
80+ ZenML is an MLOps framework for building portable ML pipelines that can run on various
81+ orchestrators including Kubernetes, AWS SageMaker, GCP Vertex AI, Kubeflow, and Airflow.
82+ It focuses on productionizing ML workflows with features like automatic
83+ containerization, artifact tracking, and native caching.
12584
126- ## [ zenml ] ( https://github.com/zenml-io/zenml )
85+ ## [ Flyte ] ( https://github.com/flyteorg/flyte )
12786
128- ## [ flyte] ( https://github.com/flyteorg/flyte )
87+ Flyte is a Kubernetes-native workflow orchestration platform for building
88+ production-grade data and ML pipelines. It provides automatic retries, checkpointing,
89+ failure recovery, and scales dynamically across cloud providers including AWS, GCP, and
90+ Azure.
12991
13092## [ pipefunc] ( https://github.com/pipefunc/pipefunc )
13193
132- A tool for executing graphs made out of functions. More focused on computational
133- compared to workflow graphs.
94+ pipefunc is a lightweight library for creating function pipelines as directed acyclic
95+ graphs (DAGs) in pure Python. It automatically handles execution order, supports
96+ map-reduce operations, parallel execution, and provides resource profiling.
97+
98+ ## [ Common Workflow Language (CWL)] ( https://www.commonwl.org/ )
99+
100+ CWL is an open standard for describing data analysis workflows in a portable,
101+ language-agnostic format. Its primary goal is to enable workflows to be written once and
102+ executed across different computing environments—from local workstations to clusters,
103+ cloud, and HPC systems—without modification. Workflows described in CWL can be
104+ registered on [ WorkflowHub] ( https://workflowhub.eu/ ) for sharing and discovery following
105+ FAIR (Findable, Accessible, Interoperable, Reusable) principles.
106+
107+ CWL is particularly prevalent in bioinformatics and life sciences where reproducibility
108+ across institutions is critical. Tools that support CWL include
109+ [ cwltool] ( https://github.com/common-workflow-language/cwltool ) (the reference
110+ implementation), [ Toil] ( https://github.com/DataBiosphere/toil ) ,
111+ [ Arvados] ( https://arvados.org/ ) , and [ REANA] ( https://reanahub.io/ ) . Some workflow
112+ systems like Snakemake and Nextflow can export workflows to CWL format.
113+
114+ pytask is not a CWL-compliant tool because it operates on a fundamentally different
115+ model. CWL describes workflows as graphs of command-line tool invocations where data
116+ flows between tools via files. pytask, in contrast, orchestrates Python functions that
117+ can execute arbitrary code, manipulate data in memory, call APIs, or perform any
118+ operation available in Python. This Python-native approach enables features like
119+ interactive debugging but means pytask workflows cannot be represented in CWL's
120+ command-line-centric specification.
0 commit comments