Using DALI vs. Tarred audio datasets #3026

piraka9011 · 2021-10-20T17:23:48Z

piraka9011
Oct 20, 2021

Is there a discussion anywhere on the data loading performance benefits of DALI vs. creating tarred audio datasets?
When should we use one over the other and what are the pros/cons of each?

Some insight from the NeMo team would be appreciated :)

titu1994 · 2021-10-20T18:32:09Z

titu1994
Oct 20, 2021
Maintainer

Dali currently does not support tarred dataset support, so in an hpc env where network communication is non negligible, it is advised to use tarred dataset. On the other hand, when you want speed of training on a single machine or single node training, Dali will be much better as long as disk is available locally so there is not much network bandwidth used

1 reply

titu1994 Oct 20, 2021
Maintainer

Scratch that, Daliv1.7 now does support webdataset and I'll look into supporting it in Nemo too v

piraka9011 · 2021-10-21T02:00:19Z

piraka9011
Oct 21, 2021
Author

Nice, great to hear!
Would there be any benefit using Dali on a single machine w/ multiple vs. a single GPU and whether the dataset is tarred or not?

On a similar note; have you potentially come across the following error when using Dali on multiple GPUs?

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:X and cuda:Y

Where X and Y are GPU ids.
This happened on a 8x V100 machine w/ PyTorch 1.9.1, NeMo 1.3.0, and Python 3.8 on Ubuntu 20.04.

Can open an issue if you can reproduce.

5 replies

titu1994 Oct 21, 2021
Maintainer

Hmm, device handling is done by PTL, dunno where this is occuring. A stack trace would help.

As to Dali, it is generally faster than PT (default) dataloader. By how much webdataset reduced this difference is not yet known

piraka9011 Oct 22, 2021
Author

I've opened up this issue here since I was able to repro on multiple systems #3042

If you have a high level est. of how much performance improves that would be great (ex. on a system w/ V100, DALI is better than PT by about 10 seconds)

titu1994 Oct 22, 2021
Maintainer

Thanks for the info. We'll look into it soon. I can give anecdotal summary of my findings, but DALI will speed up the audio processing part by around 4x compared to the pytorch implementation. Howecer the audio processing is one of the costs, the model forward backward (if using large models) will dominate the overall training time

titu1994 Oct 22, 2021
Maintainer

Audio processing generally goes from 400 ish ms to around 100 ms, and Dali has no impact on the rest of the pip line. So overall training time for large models will still be long since their forward backward step takes the most time compared to data processing

piraka9011 Oct 22, 2021
Author

This is the info I was looking for, thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using DALI vs. Tarred audio datasets #3026

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 6 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Using DALI vs. Tarred audio datasets #3026

Uh oh!

piraka9011 Oct 20, 2021

Replies: 2 comments · 6 replies

Uh oh!

titu1994 Oct 20, 2021 Maintainer

Uh oh!

titu1994 Oct 20, 2021 Maintainer

Uh oh!

piraka9011 Oct 21, 2021 Author

Uh oh!

titu1994 Oct 21, 2021 Maintainer

Uh oh!

piraka9011 Oct 22, 2021 Author

Uh oh!

titu1994 Oct 22, 2021 Maintainer

Uh oh!

titu1994 Oct 22, 2021 Maintainer

Uh oh!

piraka9011 Oct 22, 2021 Author

piraka9011
Oct 20, 2021

Replies: 2 comments 6 replies

titu1994
Oct 20, 2021
Maintainer

titu1994 Oct 20, 2021
Maintainer

piraka9011
Oct 21, 2021
Author

titu1994 Oct 21, 2021
Maintainer

piraka9011 Oct 22, 2021
Author

titu1994 Oct 22, 2021
Maintainer

titu1994 Oct 22, 2021
Maintainer

piraka9011 Oct 22, 2021
Author