You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
refactor(generators): unify generators to work with any storage backend (argonne-lcf#329)
Every new storage backend required copy-pasting each generator into an
_XXX sibling file: npz_generator_s3.py, npy_generator_s3.py and so on.
The only difference was whether to write the output locally on disk,
directly via numpy/PIL, or via the storage interface.
This makes the pattern unsustainable: two duplicated formats today, more
with each new backend — incurring a significant maintenance burden.
Since all generators already had a storage instance and used it to
generate file names, we can leverage it.
The only set of generators now can check if the stroage is locally available
via `islocalfs` and use some optimisation, if any. If the storage is not local,
the sample serializes to io.BytesIO, call buf.getvalue(), and
delegate to self.storage.put_data().
All storage backends receive plain bytes as designed by the storage interface,
removing type inspection, seek() and getvalue() calls scattered across backends.
- FileStorage.put_data was never called, had text-mode open and a double
get_uri call (once from the generator, once inside put_data itself).
Now it is the default write path for LOCAL_FS, used by almost every
workload config. get_data aligned to binary mode ("rb") for consistency.
- AIStoreStorage.put_data: remove isinstance dispatch, accept bytes directly.
- S3TorchStorage.put_data: remove data.getvalue() — just write data.
- generator_factory: removed S3/AIStore branching for NPZ, NPY, JPEG.
- factory referenced jpeg_generator_s3.JPEGGeneratorS3 which never existed;
JPEG + S3/AIStore would crash at import time.
After this patch, adding a new storage backend requires no changes in any
generator. Adding a new data format automatically works with all backends.
Signed-off-by: Denis Barakhtanov <dbarahtanov@enakta.com>
Co-authored-by: Denis Barakhtanov <denis.barahtanov@gmail.com>
0 commit comments