Skip to content

Support Direct Loading of NetCDF Time-Series Data Without Conversion #74

@j6k4m8

Description

@j6k4m8

Continued from slack discussion [deeplink]

Currently, mllam-data-prep netcdf loaders require data to be in single "long" files (i.e., all timestamps in one data element on disk).

In contrast, my dataset consists of NetCDF files with time-series data, where each file represents a single measurement (Dims = [x, y, time], with time always being a single value). Instead of concatenating these files manually, I’m exploring ways to load them directly using a more flexible datastore approach.

Proposed Solution

  • Introduce a method to glob NetCDF files in the YAML config, mapping timestamps from filenames to a proper time dimension.
  • Alternatively, improve the existing datastore or document using Kerchunk to create a reference-based dataset without redundant copies.

Related Discussions

Next Steps

  • Determine if a more flexible datastore is needed or if an improved documentation approach (e.g., a tutorial) would suffice.
  • Evaluate performance trade-offs of different loading methods.

Would love input from others working on similar datasets!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions