How to calculate metric over entire validation set when training with DDP? #3225

i-pan · 2020-08-27T14:33:47Z

i-pan
Aug 27, 2020

I started refactoring my code into Lightning yesterday. When I perform validation, I save all the predictions over the entire validation set and then calculate the validation metrics on all validation data at once. This is especially important for metrics like AUROC.

I am training a model with DDP on 4 GPUs. I have a validation_epoch_end method to calculate a metric over the entire validation set:

def validation_epoch_end(self, outputs):
    auc_score = sklearn.metrics.roc_auc_score(outputs.target.cpu().numpy(), outputs.output[:,1].cpu().numpy())
    result = pl.EvalResult(checkpoint_on=torch.tensor(auc_score))
    result.log('auc', auc_score)
    return result

Here is a script that illustrates what the problem I'm encountering:
snippet.zip

However, when using DDP, this method gets called separately in each process, so I end up calculating the metric 4 times on 1/4 of the overall validation set. When I look at the values of each of the 4 AUROCs and the value that gets saved to checkpoint_on, the saved value is just 1 of the 4 (I'm assuming the one calculated by the process with rank 0?).

I tried using the built-in pytorch_lightning metrics, but those give me a RuntimeError: Tensors must be CUDA and dense. This is using the most current branch (0.9.1.dev).

There may be a simple solution to this, but I spent the last few hours combing through the docs and existing issues without any luck.

Thanks in advance to anyone who can help.

i-pan · 2020-08-27T16:22:00Z

i-pan
Aug 27, 2020
Author

I found a workaround where we only use the DistributedSampler for training and not for validation. This was done by editing the auto_add_sampler method of class TrainerDataLoadingMixin intrainer/data_loading.py:

Line 166:
dataloader = self.replace_sampler(dataloader, sampler)
to
if train: dataloader = self.replace_sampler(dataloader, sampler)

Now all processes run inference on the entire validation set, which seems inefficient (probably the same speed as single GPU validation), so they all return the same metrics.

In MMDetection, there is a class DistEvalHook that first performs multi-GPU inference and then runs evaluation on 1 process:

results = multi_gpu_test(
  runner.model,
  self.dataloader,
  tmpdir=tmpdir,
  gpu_collect=self.gpu_collect)
if runner.rank == 0:
  print('\n')
  self.evaluate(runner, results)

I'll take a look at the source code to see if something like this could be integrated into Lightning.

0 replies

awaelchli · 2020-08-27T17:28:39Z

awaelchli
Aug 27, 2020

https://pytorch-lightning.readthedocs.io/en/latest/metrics.html#auroc
can reduce in ddp, maybe you can use that in validation_epoch_end instead of the sklearn metric, then you also don't have to move the tensors to cpu.

0 replies

s-rog · 2020-08-28T00:45:49Z

s-rog
Aug 28, 2020

    def reduce_mean(self, x):
        group = torch.distributed.group.WORLD
        torch.distributed.barrier(group)
        torch.distributed.reduce(x, 0)
        if self.global_rank == 0:
            x /= torch.distributed.get_world_size(group)
        return x

I run this on my metrics in validation_epoch_end but result objects can be synced via:

result = pl.EvalResult()
result.log('x', x, dist_sync=True)

though you might have to cast your tensor to cuda (self.device) first

0 replies

psinger · 2020-08-29T10:26:14Z

psinger
Aug 29, 2020

How can I calculate a custom metric over the entire set?

0 replies

s-rog · 2020-08-29T13:30:11Z

s-rog
Aug 29, 2020

How can I calculate a custom metric over the entire set?

Interested in this as well, so far I only know how to calculate on each gpu then reduce

0 replies

psinger · 2020-08-29T13:44:40Z

psinger
Aug 29, 2020

I am currently trying something similar as you attempted @s-rog - the idea is to pickle the results for each rank and then collect them afterwards on rank=0.

2 replies

cswwp Feb 10, 2021

@psinger have you solve this issue? i also want to implement custom auc metric based on DDP training

angadkalra Feb 11, 2021

I tried writing tensors to pickle file on every batch (concat within 1 file) but at end of epoch when reading the file I ran out of GPU memory because it all reads onto rank 0 GPU.

junwen-austin · 2020-09-01T13:23:07Z

junwen-austin
Sep 1, 2020

Please see #3159 for a temp solution. I have it tested to work on my code.

0 replies

s-rog · 2020-09-01T14:35:44Z

s-rog
Sep 1, 2020

@psinger looking into the torch distributed docs I think we need to:

gather (pred and target) on rank 0
stack or cat as appropriate
calculate metrics (and reduce)

or

calculate unreduced metric on all ranks
gather on rank 0
stack or cat as appropriate
reduce

I'm assuming this is only for logging purposes as backprop would probably cause issues. Also this is just from looking at the docs, haven't tried it out yet.

Edit:
looks like @junwen-austin already tested the latter! I'll test out the former when I get the chance... not sure if any metrics would need to be calculated this way though since it's more complicated requiring 2 gathers

0 replies

sooheon · 2020-09-04T02:13:33Z

sooheon
Sep 4, 2020

@awaelchli The pl metric AUROC does not have reduce_group or reduce_op defined, can it still reduce across DDP?

1 reply

angadkalra Feb 11, 2021

Don't think so

awaelchli · 2020-09-04T03:37:59Z

awaelchli
Sep 4, 2020

@sooheon hmm, I don't think you can define a meaningful reduction operation for that metric. The best is to gather all pairs and then compute the roc once for all data.

1 reply

angadkalra Feb 9, 2021

What is the best way to save the preds/labels tensors computed during the step functions? Is returning them in a dict along with loss as described in the pl docs the recommended way? I feel like this slows down my training as the epoch goes on? Thanks!

sooheon · 2020-09-04T08:16:03Z

sooheon
Sep 4, 2020

The best is to gather all pairs and then compute the roc once for all data.

Currently if I use AUROC in val_epoch_end, does this happen?

1 reply

angadkalra Feb 10, 2021

Don't think so, it's a functional metric. Apparently it will be a class metric in v1.2

SkafteNicki · 2020-09-05T09:29:53Z

SkafteNicki
Sep 5, 2020
Collaborator

Note that we are working on implementing aggregation for metrics (this PR #3321 has started the process) such that each metric gets an aggregated property which will calculate the metric over all data seen so far.

5 replies

Cuberick-Orion Feb 8, 2021

Note that we are working on implementing aggregation for metrics (this PR #3321 has started the process) such that each metric gets an aggregated property which will calculate the metric over all data seen so far.

Also interested in this question. For me, 2nd option is not available, as I need to see the entire val_set when calculating the score. As of this moment, @i-pan 's workaround of disabling DistributedSampler on validation is the only feasible one (though it does duplicate work).

SkafteNicki Feb 8, 2021
Collaborator

@Cuberick-Orion the metrics package have seen quite a overhaul since this discussion started. If you are interested in calculating the AUROC metric over the hole validation set, it should be available from v1.2 of lightning (next release) so you should be able to do something like

def __init__(self, ...):
    ...
    self.auroc = pytorch_lightning.metrics.AUROC()

def validation_step(self, batch, batch_idx):
    ...
    self.auroc.update(preds, target)
    self.log('auroc', self.auroc, on_epoch=True)

Cuberick-Orion Feb 9, 2021

Thanks for the reply, I am actually working with something different than AUROC. However, I do face the same problem of wanting the model to see the entire validation set on one GPU. Perhaps it would be better to open a new issue to go into the details?

SkafteNicki Feb 9, 2021
Collaborator

@Cuberick-Orion feel free to do so :]

Cuberick-Orion Feb 9, 2021

Will do. Thanks!

SkafteNicki · 2020-10-14T18:07:04Z

SkafteNicki
Oct 14, 2020
Collaborator

Class based metrics have been revamped!
Please check out the documentation for the new interface, and see if the new interface solves your problem (less metrics are available at the moment as we are in the process of converting them to the new api).

6 replies

SkafteNicki Feb 9, 2021
Collaborator

Yes they are :]

angadkalra Feb 9, 2021

Thanks for quick reply. One more question:
What is the best way to save the tensors computed during the step functions (e.g preds/targets) for epoch-end calculations? Is returning them in a dict along with loss as described in the pl docs the recommended way? I was thinking about using the current (1.1.8) Metrics API to keep track of tensors that I need at end of the epoch (concat every batch), what do you think?
Thanks!

SkafteNicki Feb 10, 2021
Collaborator

Yes, I would say that the recommended way is to return them in dict.
You can also implement your own Metric. Take for example the ExplainedVariance metric: https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/metrics/regression/explained_variance.py. The update method of this metric does nothing other than saving the preds and targer internally, and then all the logic is implemented in compute. This seems very similar to your usecase :]

angadkalra Feb 10, 2021

Thanks!

angadkalra Feb 11, 2021

@SkafteNicki Does the Metric API in v1.1.8 work with ddp and multi-gpus? I saw this issue: (https://github.com//issues/4353#issuecomment-717758797)

I implemented the exact metric you posted above (explained_variance) and it works fine with 1 GPU but 2 or more and it just freezes after validation phase and GPUs are 100%.

This is my metric:

import torch
from pytorch_lightning.metrics import Metric
from pytorch_lightning.metrics.functional.classification import multiclass_auroc

class AUC(Metric):
    def __init__(self, dist_sync_on_step=False):
        super().__init__(compute_on_step=False, dist_sync_on_step=dist_sync_on_step)

        self.add_state("preds", default=[], dist_reduce_fx=None)
        self.add_state("targets", default=[], dist_reduce_fx=None)

    def update(self, preds: torch.Tensor, targets: torch.Tensor):
        self.preds.append(preds)
        self.targets.append(targets)

    def compute(self):
        preds = torch.cat(self.preds, dim=0)
        targets = torch.cat(self.targets, dim=0)
        return multiclass_auroc(preds, targets)

I'm calling self.model_head.metric(preds, targets) during every training_step and during epoch end I call self.model_head.metric.compute(), just like the PL docs suggest. I even tried calling the update on training_step_end but still freezes.

Any idea?

PengyuWang · 2021-03-05T19:35:34Z

PengyuWang
Mar 5, 2021

I have the same problem. I want to compute some metrics on the entire validation set while using DDP.

Could you confirm that it's supported to calculate F1, ROC AUC, PR AUC in the latest PyTorch Lightning version now?

If I want to calculate a customer metric, what should I do?

Thanks in advance!

2 replies

SkafteNicki Mar 7, 2021
Collaborator

@PengyuWang all class based metrics support DDP calculations, including F1, ROC AUC, PR AUC.
If you want to implement you own custom metrics, you need to implement it as a new class that inherit from the pytorch_lightning.metrics.Metric class and implement the update and compute functions.
Please see this section in the docs: https://pytorch-lightning.readthedocs.io/en/stable/extensions/metrics.html#implementing-a-metric

ZhiyuanChen Apr 29, 2021

@PengyuWang all class based metrics support DDP calculations, including F1, ROC AUC, PR AUC.
If you want to implement you own custom metrics, you need to implement it as a new class that inherit from the pytorch_lightning.metrics.Metric class and implement the update and compute functions.
Please see this section in the docs: https://pytorch-lightning.readthedocs.io/en/stable/extensions/metrics.html#implementing-a-metric

I did a small test but the result seems to be within one card.

How to calculate metric over entire validation set when training with DDP? #3225

Uh oh!

Replies: 14 comments · 18 replies

Uh oh!

i-pan Aug 27, 2020 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SkafteNicki Sep 5, 2020 Collaborator

Uh oh!

Uh oh!

Uh oh!

SkafteNicki Feb 8, 2021 Collaborator

Uh oh!

Uh oh!

SkafteNicki Feb 9, 2021 Collaborator

Uh oh!

Uh oh!

SkafteNicki Oct 14, 2020 Collaborator

Uh oh!

SkafteNicki Feb 9, 2021 Collaborator

Uh oh!

Uh oh!

SkafteNicki Feb 10, 2021 Collaborator

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Replies: 14 comments 18 replies

i-pan
Aug 27, 2020
Author

SkafteNicki
Sep 5, 2020
Collaborator

SkafteNicki Feb 8, 2021
Collaborator

SkafteNicki Feb 9, 2021
Collaborator

SkafteNicki
Oct 14, 2020
Collaborator

SkafteNicki Feb 9, 2021
Collaborator

SkafteNicki Feb 10, 2021
Collaborator