Broadcasting Data Loader #16

philipp-fischer · 2024-09-27T14:03:45Z

Currently, when using Energon with different types of parallelism, the user has to handle that in the framework around Energon.

If you are working with data parallelism (DP), tensor parallelism (TP), pipeline parallelism (PP) and other forms, on each global rank you'll have to

decide whether to initialize the data loader on that rank (if not, the rank needs to receive data via broadcasting)
compute the correct data rank to set the WorkerConfig of the loader

This feature request proposes to create a new wrapper around the energon methods get_savable_loader and get_train_dataset that shall handle these things automatically.

The user receives a virtual dataloader on each global rank. Internally, some of those will actually load the data from disk or object store while other ranks receive the data via NCCL broadcasting.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Broadcasting Data Loader #16

Broadcasting Data Loader #16

philipp-fischer commented Sep 27, 2024 •

edited

Loading

Broadcasting Data Loader #16

Broadcasting Data Loader #16

Comments

philipp-fischer commented Sep 27, 2024 • edited Loading

philipp-fischer commented Sep 27, 2024 •

edited

Loading