Skip to content

Commit e4e6163

Browse files
committed
update intro text
1 parent c334ad7 commit e4e6163

File tree

3 files changed

+52
-40
lines changed

3 files changed

+52
-40
lines changed

README.md

+28-20
Original file line numberDiff line numberDiff line change
@@ -19,43 +19,51 @@
1919

2020
---
2121

22-
`zappend` is a tool written in Python that is used for robustly creating and updating
23-
Zarr datacubes from smaller dataset slices. It is built on top of the awesome Python
24-
packages [xarray](https://docs.xarray.dev/) and [zarr](https://zarr.readthedocs.io/).
22+
`zappend` is a tool written in Python that is used for robustly creating and
23+
updating Zarr datacubes from smaller dataset slices. It is built on top of the
24+
awesome Python packages [xarray](https://docs.xarray.dev/) and [zarr](https://zarr.readthedocs.io/).
2525

2626
## Motivation
2727

28-
The objective of `zappend` is to address recurring memory issues when generating large
29-
geospatial datacubes using the [Zarr format](https://zarr.readthedocs.io/en/stable/spec/v2.html)
30-
by subsequently concatenating data slices along an append dimension, e.g., `time`
31-
(the default) for geospatial satellite observations.
32-
Each append step is atomic, that is, the append operation is a transaction that can be
33-
rolled back, in case the append operation fails. This ensures integrity of the target
34-
data cube.
28+
The objective of `zappend` is empowering geodata scientists and developers to
29+
robustly create large data cubes. The tool performs transaction-based dataset
30+
appends to existing data cubes in the
31+
[Zarr format](https://zarr.readthedocs.io/en/stable/spec/v2.html). If an error
32+
occurs during an append step — typically due to I/O problems or out-of-memory
33+
conditions — `zappend` will automatically roll back the operation, ensuring that
34+
the existing data cube maintains its structural integrity. The design drivers
35+
behind zappend are first ease of use and secondly, high configurability
36+
regarding filesystems, data source types, data cube outline and encoding.
37+
38+
The tool comprises a command-line interface, a Python API for programmatic
39+
control, and a comprehensible documentation to guide users effectively.
40+
You can easily install `zappend` as a plain Python package using either
41+
`pip install zappend` or `conda install -conda-forge zappend`.
3542

3643
## Features
3744

3845
The `zappend` tool provides the following features:
3946

40-
* **Locking**: While the target dataset is being modified, a file lock is created,
41-
effectively preventing concurrent dataset modifications.
47+
* **Locking**: While the target dataset is being modified, a file lock is
48+
created, effectively preventing concurrent dataset modifications.
4249
* **Transaction-based dataset appends**: On failure during an append step,
4350
the transaction is rolled back, so that the target dataset remains valid and
4451
preserves its integrity.
45-
* **Filesystem transparency**: The target dataset may be generated and updated in
46-
any writable filesystems supported by the
52+
* **Filesystem transparency**: The target dataset may be generated and updated
53+
in any writable filesystems supported by the
4754
[fsspec](https://filesystem-spec.readthedocs.io/) package.
4855
The same holds for the slice datasets to be appended.
4956
* **Dataset polling**: The tool can be configured to wait for slice datasets to
5057
become available.
51-
* **CLI and Python API**: The tool can be used in a shell using the [`zappend`](cli.md)
52-
command or from Python. When used from Python using the
53-
[`zappend()`](api.md) function, slice datasets can be passed as local file paths,
54-
URIs, as datasets of type
58+
* **Dynamic attributes**: Use syntax `{{ expression }}` to update the target
59+
dataset with dynamically computed attribute values.
60+
* **CLI and Python API**: The tool can be used in a shell using the
61+
[`zappend`](cli.md) command or from Python. When used from Python using the
62+
[`zappend()`](api.md) function, slice datasets can be passed as local file
63+
paths, URIs, as datasets of type
5564
[xarray.Dataset](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html), or as custom
5665
[zappend.api.SliceSource](https://bcdev.github.io/zappend/api/#class-slicesource) objects.
5766

58-
59-
67+
6068
More about zappend can be found in its
6169
[documentation](https://bcdev.github.io/zappend/).

docs/index.md

+23-19
Original file line numberDiff line numberDiff line change
@@ -1,40 +1,44 @@
11
<!--- Align following section with README.md -->
22

3-
# The zappend Tool
3+
# zappend Documentation
44

5-
`zappend` is a tool written in Python that is used for robustly creating and updating
6-
Zarr datacubes from smaller dataset slices. It is built on top of the awesome Python
7-
packages [xarray](https://docs.xarray.dev/) and [zarr](https://zarr.readthedocs.io/).
5+
`zappend` is a tool written in Python that is used for robustly creating and
6+
updating Zarr datacubes from smaller dataset slices. It is built on top of the
7+
awesome Python packages [xarray](https://docs.xarray.dev/) and [zarr](https://zarr.readthedocs.io/).
88

99
## Motivation
1010

11-
The objective of `zappend` is to address recurring memory issues when generating large
12-
geospatial datacubes using the [Zarr format](https://zarr.readthedocs.io/en/stable/spec/v2.html)
13-
by subsequently concatenating data slices along an append dimension, e.g., `time`
14-
(the default) for geospatial satellite observations.
15-
Each append step is atomic, that is, the append operation is a transaction that can be
16-
rolled back, in case the append operation fails. This ensures integrity of the target
17-
data cube.
11+
The objective of `zappend` is empowering geodata scientists and developers to
12+
robustly create large data cubes. The tool performs transaction-based dataset
13+
appends to existing data cubes in the
14+
[Zarr format](https://zarr.readthedocs.io/en/stable/spec/v2.html). If an error
15+
occurs during an append step — typically due to I/O problems or out-of-memory
16+
conditions — `zappend` will automatically roll back the operation, ensuring that
17+
the existing data cube maintains its structural integrity. The design drivers
18+
behind zappend are first ease of use and secondly, high configurability
19+
regarding filesystems, data source types, data cube outline and encoding.
1820

1921
## Features
2022

2123
The `zappend` tool provides the following features:
2224

23-
* **Locking**: While the target dataset is being modified, a file lock is created,
24-
effectively preventing concurrent dataset modifications.
25+
* **Locking**: While the target dataset is being modified, a file lock is
26+
created, effectively preventing concurrent dataset modifications.
2527
* **Transaction-based dataset appends**: On failure during an append step,
2628
the transaction is rolled back, so that the target dataset remains valid and
2729
preserves its integrity.
28-
* **Filesystem transparency**: The target dataset may be generated and updated in
29-
any writable filesystems supported by the
30+
* **Filesystem transparency**: The target dataset may be generated and updated
31+
in any writable filesystems supported by the
3032
[fsspec](https://filesystem-spec.readthedocs.io/) package.
3133
The same holds for the slice datasets to be appended.
3234
* **Dataset polling**: The tool can be configured to wait for slice datasets to
3335
become available.
34-
* **CLI and Python API**: The tool can be used in a shell using the [`zappend`](cli.md)
35-
command or from Python. When used from Python using the
36-
[`zappend()`](api.md) function, slice datasets can be passed as local file paths,
37-
URIs, as datasets of type
36+
* **Dynamic attributes**: Use syntax `{{ expression }}` to update the target
37+
dataset with dynamically computed attribute values.
38+
* **CLI and Python API**: The tool can be used in a shell using the
39+
[`zappend`](cli.md) command or from Python. When used from Python using the
40+
[`zappend()`](api.md) function, slice datasets can be passed as local file
41+
paths, URIs, as datasets of type
3842
[xarray.Dataset](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html), or as custom
3943
[zappend.api.SliceSource](https://bcdev.github.io/zappend/api/#class-slicesource) objects.
4044

setup.cfg

+1-1
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ long_description = file: README.md
77
long_description_content_type = text/markdown
88
keywords = analysis ready data, data science, datacube, xarray, zarr
99
license = MIT
10-
url = https://bcdev.github.io/zappend
10+
url = https://github.com/bcdev/zappend
1111
project_urls =
1212
Documentation = https://bcdev.github.io/zappend/
1313
Issues = https://github.com/bcdev/zappend/issues

0 commit comments

Comments
 (0)