-
-
Notifications
You must be signed in to change notification settings - Fork 366
feat: Implement ZEP 8 URL syntax support for zarr-python #3369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 18 commits
0e179d3
c5aefd3
26192cb
b67edcf
dc5acd3
227c214
ebe9660
3b55281
c8941ba
2126a3b
0cbeb72
14c2817
20a35f0
7533d95
542c3ec
4e8a925
0315e0b
5666478
35526a5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,27 @@ | ||
| Add support for ZEP8 URL syntax for store discovery and chaining of store adapters. | ||
|
|
||
| This feature introduces URL-based storage specification following `ZEP 8: Zarr URL Specification`_, | ||
| allowing users to specify complex storage configurations using concise URL syntax with chained adapters. | ||
|
|
||
| Key additions: | ||
|
|
||
| * **URL syntax support**: Use pipe (``|``) characters to chain storage adapters, e.g., ``file:data.zip|zip`` | ||
| * **Built-in adapters**: Support for ``file``, ``memory``, ``zip``, ``s3``, ``https``, ``gcs`` schemes | ||
| * **Store adapter registry**: New ``zarr.registry.list_store_adapters()`` function to discover available adapters | ||
| * **Extensible architecture**: Custom store adapters can be registered via entry points | ||
|
|
||
| Examples:: | ||
|
|
||
| # Basic ZIP file storage | ||
| zarr.open_array("file:data.zip|zip", mode='w', shape=(10, 10), dtype="f4") | ||
|
|
||
| # In-memory storage | ||
| zarr.open_array("memory:", mode='w', shape=(5, 5), dtype="i4") | ||
|
|
||
| # Remote ZIP file | ||
| zarr.open_array("s3://bucket/data.zip|zip", mode='r') | ||
|
|
||
| # 3rd-party store adapter (icechunk on S3) | ||
| zarr.open_group("s3://bucket/repo|icechunk", mode='r') | ||
|
|
||
| .. _ZEP 8\: Zarr URL Specification: https://zarr.dev/zeps/draft/ZEP0008.html | ||
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
|
|
@@ -198,6 +198,28 @@ To open an existing array from a ZIP file:: | |||
| [0.4335856 , 0.7565437 , 0.7828931 , ..., 0.48119593, 0.66220033, | ||||
| 0.6652362 ]], shape=(100, 100), dtype=float32) | ||||
|
|
||||
| URL-based Storage (ZEP 8) | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
|
|
||||
| Zarr supports URL-based storage following the ZEP 8 specification, which allows you to specify storage locations using URLs with chained adapters:: | ||||
|
Comment on lines
+201
to
+204
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Needs a link to the more extensive docs. |
||||
|
|
||||
| >>> # Store data directly in a ZIP file using ZEP 8 URL syntax | ||||
| >>> z = zarr.open_array("file:data/example-zep8.zip|zip", mode='w', shape=(50, 50), chunks=(10, 10), dtype="f4") | ||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should probably be adding a .zarr in here somewhere. so we aren't creating otherwise unopenable zip files |
||||
| >>> z[:, :] = np.random.random((50, 50)) | ||||
| >>> | ||||
| >>> # Read it back | ||||
| >>> z2 = zarr.open_array("file:data/example-zep8.zip|zip", mode='r') | ||||
| >>> z2.shape | ||||
| (50, 50) | ||||
|
|
||||
| ZEP 8 URLs use pipe (``|``) characters to chain storage adapters together: | ||||
|
|
||||
| - ``file:path.zip|zip`` - ZIP file on local filesystem | ||||
| - ``s3://bucket/data.zip|zip`` - ZIP file in S3 bucket | ||||
|
Comment on lines
+217
to
+218
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we should have a more deeply nested thing, to actually show chaining of adapters. |
||||
| - ``memory:`` - In-memory storage | ||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
not an example of piping |
||||
|
|
||||
| This provides a concise way to specify complex storage configurations without explicitly creating store objects. | ||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. move to first paragraph of seciton |
||||
|
|
||||
| Read more about Zarr's storage options in the :ref:`User Guide <user-guide-storage>`. | ||||
|
|
||||
| Next Steps | ||||
|
|
||||
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -145,6 +145,87 @@ Here's an example of using ObjectStore for accessing remote data: | |||||||
| .. warning:: | ||||||||
| The :class:`zarr.storage.ObjectStore` class is experimental. | ||||||||
|
|
||||||||
| URL-based Storage (ZEP 8) | ||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think what this section is missing is a showcasing of what the equivalent zarr-pyhthon code would be to put it in terms people are more familiar with. So each section would be: zarr.open_array("file:zep8-data.zip|zip" ....)
# is equivalent to
zarr.open_array(zarr.storage.ZipStore(...)...) |
||||||||
| ------------------------- | ||||||||
|
|
||||||||
| Zarr-Python supports URL-based storage specification following `ZEP 8: Zarr URL Specification`_. | ||||||||
| This allows you to specify complex storage configurations using a concise URL syntax with chained adapters. | ||||||||
|
|
||||||||
| ZEP 8 URLs use pipe (``|``) characters to chain storage adapters together: | ||||||||
|
|
||||||||
| >>> # Basic ZIP file storage | ||||||||
| >>> zarr.open_array("file:zep8-data.zip|zip", mode='w', shape=(10, 10), chunks=(5, 5), dtype="f4") | ||||||||
| <Array zip://zep8-data.zip shape=(10, 10) dtype=float32> | ||||||||
|
|
||||||||
| The general syntax is:: | ||||||||
|
|
||||||||
| scheme:path|adapter1|adapter2|... | ||||||||
|
|
||||||||
| Where: | ||||||||
|
|
||||||||
| * ``scheme:path`` specifies the base storage location | ||||||||
| * ``|adapter`` chains storage adapters to transform or wrap the storage | ||||||||
|
|
||||||||
| Common ZEP 8 URL patterns: | ||||||||
|
|
||||||||
| **Local ZIP files:** | ||||||||
|
|
||||||||
| >>> # Create data in a ZIP file | ||||||||
| >>> z = zarr.open_array("file:example.zip|zip", mode='w', shape=(100, 100), chunks=(10, 10), dtype="i4") | ||||||||
| >>> import numpy as np | ||||||||
| >>> z[:, :] = np.random.randint(0, 100, size=(100, 100)) | ||||||||
|
|
||||||||
| **Remote ZIP files:** | ||||||||
|
|
||||||||
| >>> # Access ZIP file from S3 (requires s3fs) | ||||||||
| >>> zarr.open_array("s3://bucket/data.zip|zip", mode='r') # doctest: +SKIP | ||||||||
|
|
||||||||
| **In-memory storage:** | ||||||||
|
|
||||||||
| >>> # Create array in memory | ||||||||
| >>> z = zarr.open_array("memory:", mode='w', shape=(5, 5), dtype="f4") | ||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can I then access this from somewhere else using this syntax? e.g. |
||||||||
| >>> z[:, :] = np.random.random((5, 5)) | ||||||||
|
|
||||||||
| **With format specification:** | ||||||||
|
|
||||||||
| >>> # Specify Zarr format version | ||||||||
| >>> zarr.create_array("file:data-v3.zip|zip|zarr3", shape=(10,), dtype="i4") # doctest: +SKIP | ||||||||
|
|
||||||||
| **Debugging with logging:** | ||||||||
|
|
||||||||
| >>> # Log all operations on any store type | ||||||||
| >>> z = zarr.open_array("memory:|log:", mode='w', shape=(5, 5), dtype="f4") # doctest: +SKIP | ||||||||
| >>> # 2025-08-24 20:01:13,282 - LoggingStore(memory://...) - INFO - Calling MemoryStore.set(zarr.json) | ||||||||
| >>> | ||||||||
| >>> # Log operations on ZIP files with custom log level | ||||||||
| >>> z = zarr.open_array("file:debug.zip|zip:|log:?log_level=INFO", mode='w') # doctest: +SKIP | ||||||||
| >>> | ||||||||
| >>> # Log operations on remote cloud storage | ||||||||
| >>> z = zarr.open_array("s3://bucket/data.zarr|log:", mode='r') # doctest: +SKIP | ||||||||
|
|
||||||||
| Available adapters: | ||||||||
|
|
||||||||
| * ``file`` - Local filesystem paths | ||||||||
| * ``zip`` - ZIP file storage | ||||||||
| * ``memory`` - In-memory storage | ||||||||
| * ``s3``, ``gs``, ``gcs`` - Cloud storage (requires appropriate fsspec backends) | ||||||||
| * ``log`` - Logging wrapper for debugging store operations | ||||||||
| * ``zarr2``, ``zarr3`` - Format specification adapters | ||||||||
|
|
||||||||
| You can programmatically discover all available adapters using :func:`zarr.registry.list_store_adapters`: | ||||||||
|
|
||||||||
| >>> import zarr | ||||||||
| >>> zarr.registry.list_store_adapters() # doctest: +SKIP | ||||||||
| ['file', 'gcs', 'gs', 'https', 'memory', 's3', 'zip', ...] | ||||||||
|
|
||||||||
| Additional adapters can be implemented as described in the `extending guide <./extending.html#custom-store-adapters>`_. | ||||||||
|
|
||||||||
| .. note:: | ||||||||
| When using ZEP 8 URLs with third-party libraries like xarray, the URL syntax allows | ||||||||
| seamless integration without requiring zarr-specific store creation. | ||||||||
|
Comment on lines
+223
to
+225
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
This is already effectively stated above. |
||||||||
|
|
||||||||
| .. _ZEP 8\: Zarr URL Specification: https://zarr.dev/zeps/draft/ZEP0008.html | ||||||||
|
|
||||||||
| .. _user-guide-custom-stores: | ||||||||
|
|
||||||||
| Developing custom stores | ||||||||
|
|
||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| from zarr.abc.store_adapter import StoreAdapter, URLSegment | ||
|
|
||
| __all__ = ["StoreAdapter", "URLSegment"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like a bad idea to break file uris.
https://datatracker.ietf.org/doc/html/rfc8089
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is considered in the spec doc this PR is building on: zarr-developers/zeps#48
https://github.com/jbms/zeps/blob/92bc64111c7612083560358efdd4450e061f3746/draft/ZEP0008.md?plain=1#L115-L119
And later is says:
If you forsee serious issues here I'd encourage commenting on that PR on the standard.