Skip to content

some improvements to our configuration model #3538

@d-v-b

Description

@d-v-b

I have a few complaints with our config model. For this issue I consider our explicit zarr.config object, as well as the various registries (data types, codecs, etc) as all part of our config model.

  • Our config is effectively an untyped dict, which has two main drawbacks:
    • the config API is not not IDE / autocomplete friendly
    • the config API does not emit errors when invalid or unknown configuration values are set
  • When creating an array, you can provide a custom codec without registering it. But when reading an array, there's no way to explicitly declare the codec classes you would like to use. Instead, you have to pursue a very indirect approach by registering the codec AND declaring the codec in the global config object. This is not smooth.

I have some ideas for addressing these concerns.

  1. Define an explicit, typed model of our global config. Setting invalid keys in the config will be an error. I don't think we need runtime type checks, because these will appear as runtime errors anyway, but we will define a static API surface for the config. If we do need runtime type checks, there's always add a runtime type checker for metadata objects #3400 .

Under this proposal, instead of this

 zarr.config.set({'array.order': 'F'})

we would have something like these options:

zarr.config.array.set_order('F')
zarr.config.array.order = 'F'

We can wrap the new API with the old config API around for a while to make the transition smooth.

  1. Add a new keyword argument to array / group access routines that contains an object registry. Something like this:
 x = read_array(..., context={"data_type_registry": {"uint8": MyUint8Class}})

context is either a string or a mapping with string keys.
The default value of context could be the literal string "config", which uses a context defined in the global config. We could add more string values if we want to define separate prepared contexts, e.g. "cuda" which has all cuda codecs. But the user also has the option to define a context explicitly, which is useful for loading an array or group with exactly the desired data type / codec / chunk grid classes without modifying a global config.

Expect some work in these directions soon.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions