Open
Description
reproducer:
# /// script
# dependencies = [
# "zarr == 2.18",
# "numpy == 2.2",
# "numcodecs == 0.15.0",
# ]
# ///
import numpy as np
import zarr
import numcodecs
arr = zarr.create(store={}, shape=(10,), dtype=np.dtypes.StringDType(), object_codec=numcodecs.VLenUTF8())
print(arr[:])
Traceback (most recent call last):
File "/home/bennettd/.cache/uv/archive-v0/Pv26IvD846wSGOG2rRA48/lib/python3.11/site-packages/zarr/meta.py", line 118, in decode_array_metadata
dtype = cls.decode_dtype(meta["dtype"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bennettd/.cache/uv/archive-v0/Pv26IvD846wSGOG2rRA48/lib/python3.11/site-packages/zarr/meta.py", line 193, in decode_dtype
return np.dtype(d)
^^^^^^^^^^^
TypeError: data type '|T16' not understood
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/bennettd/dev/zarr-python/foo.py", line 12, in <module>
arr = zarr.create(store={}, shape=(10,), dtype=np.dtypes.StringDType(), object_codec=numcodecs.VLenUTF8())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bennettd/.cache/uv/archive-v0/Pv26IvD846wSGOG2rRA48/lib/python3.11/site-packages/zarr/creation.py", line 227, in create
z = Array(
^^^^^^
File "/home/bennettd/.cache/uv/archive-v0/Pv26IvD846wSGOG2rRA48/lib/python3.11/site-packages/zarr/core.py", line 170, in __init__
self._load_metadata()
File "/home/bennettd/.cache/uv/archive-v0/Pv26IvD846wSGOG2rRA48/lib/python3.11/site-packages/zarr/core.py", line 193, in _load_metadata
self._load_metadata_nosync()
File "/home/bennettd/.cache/uv/archive-v0/Pv26IvD846wSGOG2rRA48/lib/python3.11/site-packages/zarr/core.py", line 207, in _load_metadata_nosync
meta = self._store._metadata_class.decode_array_metadata(meta_bytes)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bennettd/.cache/uv/archive-v0/Pv26IvD846wSGOG2rRA48/lib/python3.11/site-packages/zarr/meta.py", line 141, in decode_array_metadata
raise MetadataError("error decoding metadata") from e
zarr.errors.MetadataError: error decoding metadata
Zarr python 2.18 assumes that numpy data types can be constructed from a string representation (e.g., np.dtype('uint8') makes a uint8 dtype), but the relatively new numpy variable length string data type with string representation "T16"
does not allow this kind of construction.
This means that creating a zarr format 2 array with the new numpy variable length string data type will not be accessible from zarr-python 2.18. If we care about keeping zarr-python 2.x forward compatible, we could release a new version, or we can accept the rift here.