Using DALI vs. Tarred audio datasets #3026
Replies: 2 comments 6 replies
-
|
Dali currently does not support tarred dataset support, so in an hpc env where network communication is non negligible, it is advised to use tarred dataset. On the other hand, when you want speed of training on a single machine or single node training, Dali will be much better as long as disk is available locally so there is not much network bandwidth used |
Beta Was this translation helpful? Give feedback.
-
|
Nice, great to hear! On a similar note; have you potentially come across the following error when using Dali on multiple GPUs? RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:X and cuda:YWhere X and Y are GPU ids. Can open an issue if you can reproduce. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Is there a discussion anywhere on the data loading performance benefits of DALI vs. creating tarred audio datasets?
When should we use one over the other and what are the pros/cons of each?
Some insight from the NeMo team would be appreciated :)
Beta Was this translation helpful? Give feedback.
All reactions