DDP sampler when test #15658

Hannibal046 · 2022-11-12T09:21:41Z

Hannibal046
Nov 12, 2022

Hello, when I define my test_dataloader in my lightning module like this:

    def test_dataloader(self):
        return torch.utils.data.DataLoader(self.test_dataset, batch_size=self.hparams.per_device_eval_batch_size,
                                           shuffle=False,collate_fn=self.collate_fct,
                                           num_workers=8, pin_memory=True)

I know Lightning will help automatically equip it with a DistributedSampler. This is totally fine when doing training. But when doing test, It has two drawbacks:

DistributedSampler will evenly distribute data across multiple GPUs. It means I would get duplicate data when testing, which is unaffordable when reporting my results, especially for publishing paper. And my solution is: I define num_test_data in my lighting module, and I manually truncation my test data after all_gather operation like this.

def test_epoch_end(self,outputs):
        predictions,labels = some_gather_function(outputs)
        predictions = predictions[:self.num_test_data]
        labels = labels[:self.num_test_data]
        results = some_eval_function(predictions,labels)
        self.log(results)

The default sampler would distribute data like this: Suppose I have 6 examples [0,1,2,3,4,5], and two GPUs. So the first GPU would get [0,2,4] and the second [1,3,5]. And the default gather function in pytorch link would gather object across DDP by their rank, so I would get data like this [0,2,4,1,3,5], which is definitely what I don't want even if I set shuffle=False when I init my test_dataloader. So I am wondering if it is possible to set default distributed sampler for test_dataloader in DDP as some kind of Sequentially_and_UnEvenly_Distributed Datasampler ?

What is the elegant solution for this? I know I can reorder the gather data by myself. Also, I could do this: Trainer(replace_sampler_ddp=False), but in this case, I have to take care of single_gpu and multi_gpu training which is not what I want. Maybe lightning would offer a hook in LightningModule to configure our own DDP sample like this and it would only be called when using multi-gpu:

def configure_ddp_sample(self, stage):
    if stage == "fit":
        return torch.utils.data.distributed.DistributedSampler
    elif stage in ['valid','test']:
        return MyOwnDistributedSampler

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DDP sampler when test #15658

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

DDP sampler when test #15658

Uh oh!

Hannibal046 Nov 12, 2022

Replies: 0 comments

Hannibal046
Nov 12, 2022