Skip to content

zarr python 2.18 is incompatible with numpy's variable-length strings dtype #3102

Open
@d-v-b

Description

@d-v-b

reproducer:

# /// script
# dependencies = [
#   "zarr == 2.18",
#   "numpy == 2.2",
# "numcodecs == 0.15.0",
# ]
# ///
import numpy as np
import zarr
import numcodecs

arr = zarr.create(store={}, shape=(10,), dtype=np.dtypes.StringDType(), object_codec=numcodecs.VLenUTF8())
print(arr[:])
Traceback (most recent call last):
  File "/home/bennettd/.cache/uv/archive-v0/Pv26IvD846wSGOG2rRA48/lib/python3.11/site-packages/zarr/meta.py", line 118, in decode_array_metadata
    dtype = cls.decode_dtype(meta["dtype"])
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bennettd/.cache/uv/archive-v0/Pv26IvD846wSGOG2rRA48/lib/python3.11/site-packages/zarr/meta.py", line 193, in decode_dtype
    return np.dtype(d)
           ^^^^^^^^^^^
TypeError: data type '|T16' not understood

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/bennettd/dev/zarr-python/foo.py", line 12, in <module>
    arr = zarr.create(store={}, shape=(10,), dtype=np.dtypes.StringDType(), object_codec=numcodecs.VLenUTF8())
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bennettd/.cache/uv/archive-v0/Pv26IvD846wSGOG2rRA48/lib/python3.11/site-packages/zarr/creation.py", line 227, in create
    z = Array(
        ^^^^^^
  File "/home/bennettd/.cache/uv/archive-v0/Pv26IvD846wSGOG2rRA48/lib/python3.11/site-packages/zarr/core.py", line 170, in __init__
    self._load_metadata()
  File "/home/bennettd/.cache/uv/archive-v0/Pv26IvD846wSGOG2rRA48/lib/python3.11/site-packages/zarr/core.py", line 193, in _load_metadata
    self._load_metadata_nosync()
  File "/home/bennettd/.cache/uv/archive-v0/Pv26IvD846wSGOG2rRA48/lib/python3.11/site-packages/zarr/core.py", line 207, in _load_metadata_nosync
    meta = self._store._metadata_class.decode_array_metadata(meta_bytes)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bennettd/.cache/uv/archive-v0/Pv26IvD846wSGOG2rRA48/lib/python3.11/site-packages/zarr/meta.py", line 141, in decode_array_metadata
    raise MetadataError("error decoding metadata") from e
zarr.errors.MetadataError: error decoding metadata

Zarr python 2.18 assumes that numpy data types can be constructed from a string representation (e.g., np.dtype('uint8') makes a uint8 dtype), but the relatively new numpy variable length string data type with string representation "T16" does not allow this kind of construction.

This means that creating a zarr format 2 array with the new numpy variable length string data type will not be accessible from zarr-python 2.18. If we care about keeping zarr-python 2.x forward compatible, we could release a new version, or we can accept the rift here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    V2Affects the v2 branchbugPotential issues with the zarr-python library

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions