-
Notifications
You must be signed in to change notification settings - Fork 29
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Continued from slack discussion [deeplink]
Currently, mllam-data-prep netcdf loaders require data to be in single "long" files (i.e., all timestamps in one data element on disk).
In contrast, my dataset consists of NetCDF files with time-series data, where each file represents a single measurement (Dims = [x, y, time], with time always being a single value). Instead of concatenating these files manually, I’m exploring ways to load them directly using a more flexible datastore approach.
Proposed Solution
- Introduce a method to glob NetCDF files in the YAML config, mapping timestamps from filenames to a proper time dimension.
- Alternatively, improve the existing datastore or document using Kerchunk to create a reference-based dataset without redundant copies.
Related Discussions
Next Steps
- Determine if a more flexible datastore is needed or if an improved documentation approach (e.g., a tutorial) would suffice.
- Evaluate performance trade-offs of different loading methods.
Would love input from others working on similar datasets!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request