Releases: pytask-dev/pytask
v0.4.2
Highlights
This release contains a new feature and some improvements for users.
- 🚀 The new feature is the
pytask.DataCatalog
that allows users to manage dependencies and products in projects more easily. Read the tutorial to get started. 🚀 - File changes are now detected by hashes instead of modification timestamps. It should prevent accidental executions when working with cloud storage providers like Dropbox or OneDrive and in many other situations. To save runtime, pytask uses a cache for the hashes when the modification timestamp has not changed.
- Nodes now have signatures that separate how nodes are named and displayed from how nodes are identified internally. If you have written a custom node, please update it according to the how-to guide.
- All of pytask's internal files are now stored in a
.pytask
folder in your project. The file.pytask.sqlite3
is moved to this location as well. Add.pytask
to your.gitignore
to prevent accidentally committing the folder.
What's Changed
- Simplify building the plugin manager. by @tobiasraabe in #449
- Rename
graph.py
todag_command.py
and improvecollect_command.py
. by @tobiasraabe in #451 - Remove more
.svg
s and replace them with animations. by @tobiasraabe in #454 - [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #452
- [automated] Update plugin list by @github-actions in #453
- Add more explanation when
PNode.load()
fails during execution. by @tobiasraabe in #455 - Refer to source code on Github in API docs. by @tobiasraabe in #456
- Refactor code for
format_node_name
. by @tobiasraabe in #457 - Add hook to sort
__all__
. by @tobiasraabe in #459 - Simplify removing internal tracebacks from exceptions with cause. by @tobiasraabe in #460
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #461
- Fix import error for pluggy<1.3. by @tobiasraabe in #462
- Raise error when function is defined outside the loop body. by @tobiasraabe in #463
- Improve pins. by @tobiasraabe in #464
- Test that internal tracebacks are removed by reports. by @tobiasraabe in #465
- Add
is_product
toPNode.load()
. by @tobiasraabe in #472 - Add a data catalog. by @tobiasraabe in #419
- Hash files instead of relying on modification timestamps. by @tobiasraabe in #469
- Move
.pytask.sqlite3
to.pytask/
. by @tobiasraabe in #470 - [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #471
- Update PyPI action. by @tobiasraabe in #477
- Add node signatures. by @tobiasraabe in #473
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #476
- Add snapshot tests. by @tobiasraabe in #475
- Switch from black to ruff-format. by @tobiasraabe in #478
- Rework reports and tracebacks. by @tobiasraabe in #474
- Give skips higher precendence than ancestor failed as outcome. by @tobiasraabe in #479
- Remove checks for missing root nodes. by @tobiasraabe in #480
- Improve coverage. by @tobiasraabe in #481
- Fix handling of names and signatures of
PythonNode
s. by @tobiasraabe in #482
Full Changelog: v0.4.1...v0.4.2
v0.4.1
What's Changed
Of course, it's a mandatory bug fix release after a bigger release.
Using the product annotation, Annotated[..., Product]
did not work with multiple products.
- Fix setting the name of
PythonNode
. by @tobiasraabe in #443 - Move content of
setup.cfg
topyproject.toml
. by @tobiasraabe in #444 - [automated] Update plugin list by @github-actions in #445
- Fix when multiple product annotations are used. by @tobiasraabe in #448
- Fix
PythonNode
when used as return. by @tobiasraabe in #446 - Simplify the
tree_map
code for generating the DAG. by @tobiasraabe in #447
Full Changelog: v0.4.0...v0.4.1
v0.4.0
News
pytask became three years old in July, which is a suitable event to rethink pytask's design and blow dust off of some of its oldest components.
Here are the highlights of v0.4.0 🚀 ⭐
Highlights
New interfaces for products.
Every argument can be declared as a product with the new' Product' annotation. The path can be passed as a default value.
from pathlib import Path
from pytask import Product
from typing_extensions import Annotated
def task_hello_earth(path: Annotated[Path, Product] = Path("hello_earth.txt")):
path.write_text("Hello, earth!")
More explanation can be found at https://tinyurl.com/yrezszr4.
It is also possible to use the return of the task function as a product, which allows wrapping any third-party function as a task function. Read more about it here: https://tinyurl.com/pytask-return.
from pathlib import Path
from pytask import Product
from typing_extensions import Annotated
def task_hello_earth() -> Annotated[str, Path("hello_earth.txt")]:
return "Hello, earth!"
Every task argument is a dependency
In older pytask versions, only paths were treated as task dependencies. That meant when you passed other arguments to the task, and they changed, it did not trigger a rerun of the task.
Now, every argument to a task can be a dependency, and you can hash them if they should trigger a rerun. It is explained in https://tinyurl.com/pytask-hash.
from pathlib import Path
from typing import Annotated
from pytask import Product
from pytask import PythonNode
def task_example(
text: Annotated[str, PythonNode(value="Hello, World", hash=True)],
path: Annotated[Path, Product] = Path("file.txt"),
) -> None:
path.write_text(text)
A new functional interface
The functional interface for pytask has been reworked and accepts a list of task functions. You can use it within your terminal or a Jupyter notebook. Read this guide to learn more about it: https://tinyurl.com/pytask-functional.
from pathlib import Path
from typing import Annotated
from pytask import build
def create_text() -> Annotated[str, Path("hello_earth.txt")]:
return "Hello, earth!"
session = build(tasks=[create_text])
Custom Nodes through Protocols
In the newest version, nodes (dependencies and products) and tasks follow protocols. It allows for customizations like PickleNode
s that store any Python object as a pickle file and inject the object into the task when used as a dependency. It is explained in more detail in this guide: https://tinyurl.com/pytask-custom-nodes.
Other notable changes
- Python 3.12 is supported, and support for Python 3.7 is dropped.
@pytask.mark.depends_on
and@pytask.mark.produces
are deprecated. There are better options to define dependencies and products explained in https://tinyurl.com/yrezszr4.@pytask.mark.task
is also deprecated and replaced byfrom pytask import task
and@task
.
What's Changed
- Remove Python 3.7 support and add a new action for mamba. by @tobiasraabe in #323
- Replace pony with sqlalchemy>=1.4.36. by @tobiasraabe in #387
- Remove
@pytask.mark.parametrize
. by @tobiasraabe in #391 - Parse dependencies from all args if
depends_on
is not used. by @tobiasraabe in #384 - Add products with
typing.Annotation
. by @tobiasraabe in #394 - Refactor pybaum to
_pytask.tree_util
. by @tobiasraabe in #395 - Replace pybaum with optree and add paths to PythonNode names. by @tobiasraabe in #396
- Add support for
NamedTuple
and attrs classes in@pytask.mark.task(kwargs=...)
. by @tobiasraabe in #397 - Deprecate decorators for
depends_on
andproduces
. by @tobiasraabe in #398 - Use protocols instead of ABCs. by @tobiasraabe in #402
- Allow tasks to return products. by @tobiasraabe in #404
- Tracking changes in v0.4.0. by @tobiasraabe in #400
- Bump peter-evans/create-pull-request from 5.0.1 to 5.0.2 by @dependabot in #390
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #388
- Allow to use prefix trees as nodes to parse function returns. by @tobiasraabe in #406
- Remove
.value
fromNode
protocol. by @tobiasraabe in #408 - Make
.from_annot
an optional feature of nodes. by @tobiasraabe in #409 - Allow to pass functions to
PythonNode(hash=...)
. by @tobiasraabe in #410 - Add protocols for tasks. by @tobiasraabe in #412
- Remove scripts to generate
.svg
s. by @tobiasraabe in #413 - Allow more ruff rules. by @tobiasraabe in #414
- A new functional interface. by @tobiasraabe in #411
- Deprecate
@pytask.mark.task
in favor of@pytask.task
. by @tobiasraabe in #417 - Simplify and fix code in
dag.py
. by @tobiasraabe in #418 - Convert
DeprecationWarning
toFutureWarning
for deprecated decorators. by @tobiasraabe in #420 - Remove deprecation warning for
produces
. by @tobiasraabe in #421 - Document new interface. by @tobiasraabe in #392
- Fix
import_path
. by @tobiasraabe in #424 - Publish
pytask.tree_util
. by @tobiasraabe in #426 - Fix type annotations of
task.depends_on
andtask.produces
. by @tobiasraabe in #427 - Document functional interface. by @tobiasraabe in #423
- Update example in
README.md
. by @tobiasraabe in #428 - Add better error message when
node.state()
throws error during DAG validation. by @tobiasraabe in #429 - Update parts of the documentation. by @tobiasraabe in #430
- Enable colors in WSL. by @tobiasraabe in #431
- Fix type checking for
pytask.mark.x
. by @tobiasraabe in #432 - Fix ids of
PythonNode
s. by @tobiasraabe in #433 - Add support for Python 3.12. by @tobiasraabe in #434
- Fix detection of task functions. by @tobiasraabe in #437
- Clarify some types. by @tobiasraabe in #438
- Refine typing. by @tobiasraabe in #440
Full Changelog: v0.3.2...v0.4.0
v0.4.0rc4
The last pre-release.
v0.4.0rc3
A couple of new fixes. Most notably a fix for the ids of PythonNodes that should prevent rebuilds.
v0.4.0rc2
Another release candidate that fixes the installation via conda and adds full support for pytask-parallel.
v0.4.0rc1
This is the first release candidate for the v0.4.* release series.
The final release still requires some changes. For example, the documentation needs to be extended. But, the essential parts are already there, and it is time to collect some final feedback! Let me know what you think and what needs to be improved. You can comment in the discussion for this release #422.
To install the pre-release, use
$ pip install pytask --pre
$ conda install -c "conda-forge/label/pytask_rc" pytask
Now, let's take a look at the changes.
What's Changed
New
- Dependencies and products of tasks have new interfaces that are explained in this tutorial.
- You can also now declare products by allowing task functions to return. Follow this guide.
- If you have inputs to task functions that should be hashed to detect any changes, follow this guide.
- Before, only
pathlib.Path
s received special treatment as dependencies or products to task functions. Now, it is possible to define your own nodes that simplify, for example, loading pickle files as this guide explains. But many more extensions are possible, like defining data in an S3 bucket as a dependency or product. - The functional interface has been reworked and now accepts tasks directly, allowing you to execute pytask on the command line or in Jupyter notebooks. The documentation must still be written, but here is your starting point.
Removals
- Python 3.7 is no longer supported.
@pytask.mark.parametrize
is removed. Follow this tutorial instead.
Deprecations
@pytask.mark.depends_on
,@pytask.mark.produces
are deprecated and will be removed in v0.5.0.@pytask.mark.task
is deprecated. Use@pytask.task
instead.- Paths defined as strings are deprecated and should be replaced with proper
pathlib.Path
objects.
Full list of changes
- Remove Python 3.7 support and add a new action for mamba. by @tobiasraabe in #323
- Replace pony with sqlalchemy>=1.4.36. by @tobiasraabe in #387
- Remove
@pytask.mark.parametrize
. by @tobiasraabe in #391 - Parse dependencies from all args if
depends_on
is not used. by @tobiasraabe in #384 - Add products with
typing.Annotation
. by @tobiasraabe in #394 - Refactor pybaum to
_pytask.tree_util
. by @tobiasraabe in #395 - Replace pybaum with optree and add paths to PythonNode names. by @tobiasraabe in #396
- Add support for
NamedTuple
and attrs classes in@pytask.mark.task(kwargs=...)
. by @tobiasraabe in #397 - Deprecate decorators for
depends_on
andproduces
. by @tobiasraabe in #398 - Use protocols instead of ABCs. by @tobiasraabe in #402
- Allow tasks to return products. by @tobiasraabe in #404
- Tracking changes in v0.4.0. by @tobiasraabe in #400
- Bump peter-evans/create-pull-request from 5.0.1 to 5.0.2 by @dependabot in #390
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #388
- Allow to use prefix trees as nodes to parse function returns. by @tobiasraabe in #406
- Remove
.value
fromNode
protocol. by @tobiasraabe in #408 - Make
.from_annot
an optional feature of nodes. by @tobiasraabe in #409 - Allow to pass functions to
PythonNode(hash=...)
. by @tobiasraabe in #410 - Add protocols for tasks. by @tobiasraabe in #412
- Remove scripts to generate
.svg
s. by @tobiasraabe in #413 - Allow more ruff rules. by @tobiasraabe in #414
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #407
- A new functional interface. by @tobiasraabe in #411
- Deprecate
@pytask.mark.task
in favor of@pytask.task
. by @tobiasraabe in #417 - Simplify and fix code in
dag.py
. by @tobiasraabe in #418 - Convert
DeprecationWarning
toFutureWarning
for deprecated decorators. by @tobiasraabe in #420 - Remove deprecation warning for
produces
. by @tobiasraabe in #421 - Document new interface. by @tobiasraabe in #392
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #415
Full Changelog: v0.3.2...v0.4.0rc1
v0.3.2
Highlights
This release contains the following highlights:
- Previously, if you accidentally hit the save button on an unchanged task file, the task would be rerun by pytask, although nothing had changed. Now, pytask wouldn't rerun the task because it also compares the hashes of task files, not only the modification timestamp.
- If you want to enforce rerunning tasks, there is now a
--force
flag. Take the function name/id of the task and runpytask -k <task id> --force
, and the task + its necessary tasks will be executed. Or delete a product from the task you want to rerun. - The import mechanism for task modules has been reworked, and errors resolved. Thanks to @NickCrews!
Additionally, the @pytask.mark.parametrize
decorator is deprecated and will be removed in pytask v0.4. If you use the decorator, you will have two options:
- (Recommended) Upgrade your code to the new approach for repeating tasks described in this tutorial.
- Or, pin pytask to
pytask<0.4
and silence the deprecation warning by settingsilence_parametrize_deprecation = true
in yourpyproject.toml
under[tool.pytask.ini_options]
.
What's Changed
- Update version numbers in animations. by @tobiasraabe in #345
- Add dependabot for GitHub actions. by @tobiasraabe in #348
- Publish
db
. by @tobiasraabe in #352 - Refactor nodes. by @tobiasraabe in #355
- Add
-f/--force
to force executing tasks. by @tobiasraabe in #354 - Add hashing to avoid re-executing tasks when modification times changed. by @tobiasraabe in #357
- Update
update_plugin_list.py
. by @tobiasraabe in #364 - Rework panel with sphinx-design. by @tobiasraabe in #365
- Add light and dark logos for the documentation. by @tobiasraabe in #366
- Fix the panel on the index page of the documentation. by @tobiasraabe in #367
- Fix error introduced in #364. by @tobiasraabe in #369
- Revert change turning Node.state() into a hook. by @tobiasraabe in #370
- Rename Node back to MetaNode. by @tobiasraabe in #371
- Clearer documentation for
pytask dag -o
. by @tobiasraabe in #376 - Conditionally skip tests on MacOS. by @tobiasraabe in #378
- Deprecate
@pytask.mark.parametrize
. by @tobiasraabe in #381 - Fix the import mechanism for task modules. by @NickCrews in #373
- Update changes. by @tobiasraabe in #383
New Contributors
- @dependabot made their first contribution in #349
- @NickCrews made their first contribution in #373
Full Changelog: v0.3.1...v0.3.2
v0.3.1
What's Changed
- Fix bug when passing no path on the command line. by @tobiasraabe in #337
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #341
- [automated] Update plugin list by @github-actions in #340
Full Changelog: v0.3.0...v0.3.1
v0.3.0
Highlights
This release includes a breaking change due to internal refactorings. The change affects how command line options and the configuration file are loaded and validated. For users, the changes are subtle; the help pages of the commands have prettier options and default values.
Make sure to upgrade pytask and the plugins to v0.3 or pin the packages to <0.3.
There is some delay until the updates for pytask and its plugins are available. Be aware of errors when using mixed v0.2 and v0.3 installations.
The most significant benefit is for developers who want to add command line options and configuration values. The parsing can now be handled with proper click types, for example, EnumChoice to implement choice options. Defaults are attached to command line options and are automatically displayed in the help pages.
What's Changed
- Update workflow status badge. by @tobiasraabe in #326
- Deprecate INI configurations. by @tobiasraabe in #313
- Add ruff to linters. by @tobiasraabe in #329
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #328
- Refactor
database.py
. by @tobiasraabe in #332 - Add a guide to migrate from scripts to pytask. by @tobiasraabe in #330
- Upgrade attrs and related code. by @tobiasraabe in #333
- Use ruff with all rules selected by default. by @pre-commit-ci in #331
- Set target version for ruff. by @tobiasraabe in #334
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #335
- Extend the API documentation. by @tobiasraabe in #336
Full Changelog: v0.2.7...v0.3.0