Skip to content

Combine outputs in test epochs when using DDP #11086

Discussion options

You must be logged in to vote

all_gather is different from all_reduce. It doesn't do any math operation here.
sort of like:

all_gather -> collect outputs from all devices
all_reduce -> in general, collect outputs from all devices and reduce (apply a math op)

all_gather isn't working for you?

Replies: 3 comments 11 replies

Comment options

You must be logged in to vote
9 replies
@awaelchli
Comment options

@WouterDurnez
Comment options

@WouterDurnez
Comment options

@icoz69
Comment options

@rohitgr7
Comment options

Answer selected by WouterDurnez
Comment options

You must be logged in to vote
2 replies
@WouterDurnez
Comment options

@Jerzy97
Comment options

Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
strategy: ddp DistributedDataParallel
6 participants