Better undestanding how data is loded in datamodule setup method for multi GPU setting in NLP #7186
-
I want to better understand the setup and prepare_data methods in multi gpu scenariu in context of NLP and text processing. I have prepared the DataModule which process json line file with pairs of sentence for translation task. The file contains 10M lines. When using it in multi-gpu (2 GPUs) each process will have its own copy of the train and valid set (am I right?). Which approach is better in context data utilization, random or deterministically? If data are loaded deterministically then all GPU processes, especially forward and backward pass will return the same values (for gpu 1 and 2), it is efficient? How gradients are merged and how network weight updates will be performed. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 7 replies
-
Do all of that either offline in a different script, or do it in the
Sounds good. Each GPU/node will run the same, so you will have the same train and val split in all of them (initially). Don't split the data differently for each GPU, that part will be done by the DistributedSampler [1].
Lightning takes your DataLoader and adds a DistributedSampler. The DS knows on which GPU it is and will sample only a portion of your data on one GPU and another portion on the other GPU. Each GPU sees a different split of train and a different split of val.
As explained above, the dataloader on each GPU will return different samples on each GPU automatically. Each GPU will have the same network weights, uses different data to compute gradients, then gradients are averaged so each GPU gets the same update and starts with the same weights for the next forward/backward [2].
Yes, again this is automatically done for you. References: |
Beta Was this translation helpful? Give feedback.
Do all of that either offline in a different script, or do it in the
prepare_data
hook.Sounds good. Each GPU/node will run the same, so you will have the same train and val split in all of them (initially). Don't split the d…