Skip to content

Typing of internal datatypesΒ #7457

Open
@headtr1ck

Description

@headtr1ck

Is your feature request related to a problem?

Currently there is no static typing of the underlying data structures used in DataArrays.
Simply running
reveal_type(da.data) returns Any.

Adding static typing support to that is unfortunately non-trivial since xarray supports a wide variety of duck-types.

This also comes with internal typing difficulties.

Describe the solution you'd like

I think the way to go is making the DataArray class generic in it's underlying data type.
Something like DataArray[np.ndarray] or DataArray[dask.array].

The implementation would require a TypeVar that is bound to some minimal required Protocol for internal consistency (I think at least it needs dtype and shape attributes).

Datasets would have to be typed the same way, this means only one datatype for all variables is possible, when you mix it it will fall back to the common ancestor which will be the before mentioned protocol. This is basically the same restriction that a dict has.

Now to the main issue that I see with this approach:
I don't know how to type coordinates. They have the same problems than mentioned above for Datasets.
I think it is very common to have dask arrays in the variables but simple numpy arrays in the coordinates, so either one excludes them from the typing or in such cases the common generic typing falls back to the protocol again.
Not sure what is the best approach here.

Describe alternatives you've considered

Since the most common workflow for beginners and intermediate-advanced users is to stick with the DataArrays themself and never touch the underlying data, I am not sure if this change is as beneficial as I want it to be. Maybe it just complicates things and leaving it as Any is easier to solve for advanced users that then have to cast or ignore this.

Additional context

It came up in this discussion:
#7020 (comment)_

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions