Weird DDP RNG/seed behavior #13391
Unanswered
amit-miller
asked this question in
DDP / multi-GPU / multi-node
Replies: 1 comment 1 reply
-
Yes, as you've noticed, it is expected. The DDP strategy resets the seed during its setup at: |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi.
I'm trying to validate some DDP code. The aim is to show numeric identity when working with DDP with >1 workers, compared to standard single process mode ["Standard"]. Batch size and other quantities have been adjusted properly to make sure that all things are equal.
Clearly, considering sampling involved, the state of the RNG is critical. if the RNG state is somehow different between DDP and Standard modes, changes will appear.
Consider the following pseudo code:
Now, in Standard mode, stepping the RNG once, i.e. the including the line marked with [*], create changes to the value printed in the shared_step, as expected.
However, in DDP mode, including or omitting the line marked with [*] does NOT change the value printed by the shared step. [this is of course replicated deterministically on all workers]
This is rather surprising.
For my version [pytorch/lightning = 1.12/1.6.3] adding another call to
pl.seed_everything(1)
at the line marked with [**] makes both Standard and DDP modes behave the same. It suggests that perhaps the the fit() call in DDP mode is somehow applying a seed reset with a cached value.Beta Was this translation helpful? Give feedback.
All reactions