DataJoint codecs for storing arrays in Zarr format with schema-addressed paths.
This package provides DataJoint codecs that store numpy arrays as Zarr format in object storage, using DataJoint's schema-addressed storage system. This creates browsable, organized storage that mirrors your database structure.
- Schema-addressed storage: Paths mirror database structure
(
{schema}/{table}/{pk}/{field}.zarr) - Zarr format: Portable, cloud-optimized array storage with chunking and compression
- Lazy loading: Efficient access to large arrays without loading entire datasets
- Direct access: Use
zarr.open(ref.fsmap)for advanced Zarr features - Automatic registration: Codecs are automatically available after installation
pip install dj-zarr-codecsimport datajoint as dj
import numpy as np
schema = dj.Schema("my_schema")
@schema
class Recording(dj.Manual):
definition = """
recording_id : int32
---
waveform : <zarr@> # Stored as Zarr array
"""
# Insert numpy array
Recording.insert1(
{
"recording_id": 1,
"waveform": np.random.randn(1000, 32),
}
)
# Fetch returns Zarr array (read-only)
zarr_array = (Recording & {"recording_id": 1}).fetch1("waveform")
# Use directly with numpy
result = np.mean(zarr_array, axis=0)
# Or access as Zarr for advanced features
print(zarr_array.shape) # (1000, 32)
print(zarr_array.chunks) # Zarr chunking infoConfigure your object storage in DataJoint:
dj.config["stores"] = {
"mystore": {
"protocol": "s3",
"endpoint": "s3.amazonaws.com",
"bucket": "my-bucket",
"location": "datajoint",
}
}Store numpy arrays in Zarr format with schema-addressed paths.
Features:
- Portable Zarr format (readable by any Zarr library)
- Efficient chunked storage
- Optional compression
- Schema-addressed paths for organization
Usage:
class MyTable(dj.Manual):
definition = """
id : int32
---
data : <zarr@> # Default store
large_data : <zarr@s3> # Specific store
"""git clone https://github.com/datajoint/dj-zarr-codecs.git
cd dj-zarr-codecs
pip install -e ".[dev]"pytestThis project uses Ruff for linting and formatting:
ruff check src tests
ruff format src testsContributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
MIT License. Copyright (c) 2026 DataJoint Inc. See LICENSE for details.
- DataJoint - Framework for scientific data pipelines
- Zarr - Chunked, compressed, N-dimensional arrays
- datajoint-python - DataJoint for Python
- DataJoint Documentation - Complete DataJoint documentation
- GitHub Discussions - Ask questions and share use cases
- GitHub Issues - Report bugs and request features