|
19 | 19 |
|
20 | 20 | ---
|
21 | 21 |
|
22 |
| -`zappend` is a tool written in Python that is used for robustly creating and updating |
23 |
| -Zarr datacubes from smaller dataset slices. It is built on top of the awesome Python |
24 |
| -packages [xarray](https://docs.xarray.dev/) and [zarr](https://zarr.readthedocs.io/). |
| 22 | +`zappend` is a tool written in Python that is used for robustly creating and |
| 23 | +updating Zarr datacubes from smaller dataset slices. It is built on top of the |
| 24 | +awesome Python packages [xarray](https://docs.xarray.dev/) and [zarr](https://zarr.readthedocs.io/). |
25 | 25 |
|
26 | 26 | ## Motivation
|
27 | 27 |
|
28 |
| -The objective of `zappend` is to address recurring memory issues when generating large |
29 |
| -geospatial datacubes using the [Zarr format](https://zarr.readthedocs.io/en/stable/spec/v2.html) |
30 |
| -by subsequently concatenating data slices along an append dimension, e.g., `time` |
31 |
| -(the default) for geospatial satellite observations. |
32 |
| -Each append step is atomic, that is, the append operation is a transaction that can be |
33 |
| -rolled back, in case the append operation fails. This ensures integrity of the target |
34 |
| -data cube. |
| 28 | +The objective of `zappend` is empowering geodata scientists and developers to |
| 29 | +robustly create large data cubes. The tool performs transaction-based dataset |
| 30 | +appends to existing data cubes in the |
| 31 | +[Zarr format](https://zarr.readthedocs.io/en/stable/spec/v2.html). If an error |
| 32 | +occurs during an append step — typically due to I/O problems or out-of-memory |
| 33 | +conditions — `zappend` will automatically roll back the operation, ensuring that |
| 34 | +the existing data cube maintains its structural integrity. The design drivers |
| 35 | +behind zappend are first ease of use and secondly, high configurability |
| 36 | +regarding filesystems, data source types, data cube outline and encoding. |
| 37 | + |
| 38 | +The tool comprises a command-line interface, a Python API for programmatic |
| 39 | +control, and a comprehensible documentation to guide users effectively. |
| 40 | +You can easily install `zappend` as a plain Python package using either |
| 41 | +`pip install zappend` or `conda install -conda-forge zappend`. |
35 | 42 |
|
36 | 43 | ## Features
|
37 | 44 |
|
38 | 45 | The `zappend` tool provides the following features:
|
39 | 46 |
|
40 |
| -* **Locking**: While the target dataset is being modified, a file lock is created, |
41 |
| - effectively preventing concurrent dataset modifications. |
| 47 | +* **Locking**: While the target dataset is being modified, a file lock is |
| 48 | + created, effectively preventing concurrent dataset modifications. |
42 | 49 | * **Transaction-based dataset appends**: On failure during an append step,
|
43 | 50 | the transaction is rolled back, so that the target dataset remains valid and
|
44 | 51 | preserves its integrity.
|
45 |
| -* **Filesystem transparency**: The target dataset may be generated and updated in |
46 |
| - any writable filesystems supported by the |
| 52 | +* **Filesystem transparency**: The target dataset may be generated and updated |
| 53 | + in any writable filesystems supported by the |
47 | 54 | [fsspec](https://filesystem-spec.readthedocs.io/) package.
|
48 | 55 | The same holds for the slice datasets to be appended.
|
49 | 56 | * **Dataset polling**: The tool can be configured to wait for slice datasets to
|
50 | 57 | become available.
|
51 |
| -* **CLI and Python API**: The tool can be used in a shell using the [`zappend`](cli.md) |
52 |
| - command or from Python. When used from Python using the |
53 |
| - [`zappend()`](api.md) function, slice datasets can be passed as local file paths, |
54 |
| - URIs, as datasets of type |
| 58 | +* **Dynamic attributes**: Use syntax `{{ expression }}` to update the target |
| 59 | + dataset with dynamically computed attribute values. |
| 60 | +* **CLI and Python API**: The tool can be used in a shell using the |
| 61 | + [`zappend`](cli.md) command or from Python. When used from Python using the |
| 62 | + [`zappend()`](api.md) function, slice datasets can be passed as local file |
| 63 | + paths, URIs, as datasets of type |
55 | 64 | [xarray.Dataset](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html), or as custom
|
56 | 65 | [zappend.api.SliceSource](https://bcdev.github.io/zappend/api/#class-slicesource) objects.
|
57 | 66 |
|
58 |
| - |
59 |
| - |
| 67 | + |
60 | 68 | More about zappend can be found in its
|
61 | 69 | [documentation](https://bcdev.github.io/zappend/).
|
0 commit comments