When is seed_everything required? #16992

zack-kimble · 2023-03-07T21:54:15Z

zack-kimble
Mar 7, 2023

Hi, I'm trying to understand when seed_everything is required in a training script.

I have a Lightning Module that, when run with DDP, results in a partial intersection in observations between workers. It only occurs when I enable shuffle in the dataloaders. When I debug the batch samplers, the workers all have the same seed, get the same shuffle, and end up with non-intersecting indexes in torch.utils.data.distributed.DistributedSampler. Yet somehow the batches end up with intersecting samples.

When I use seed_everything(1, workers=True) in the script that calls my module, the intersections go away, but I do not understand why. I've tried to recreate with a BoringModel, but can't. I think understanding when seed_everything is needed might help me narrow down the issue. I am not using any other explicit randomization in the module.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

When is seed_everything required? #16992

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

When is seed_everything required? #16992

Uh oh!

Uh oh!

zack-kimble Mar 7, 2023

Replies: 0 comments

zack-kimble
Mar 7, 2023