Does `self.log(..., on_epoch=True)` cost the same time as `torchmetrics.Accuracy` in distributed mode? #9172

marsggbo · 2021-08-28T08:28:33Z

marsggbo
Aug 28, 2021

class MyNet(pl.LightningModule):
    def __init__(self, ...):
        self.train_accuracy = Accuracy()

    def training_step(self, batch: Any, batch_idx: int):
        ...
        acc = self.train_accuracy(predictions, targets)

        self.log('train/acc', on_step=True, on_epoch=True)

In pl, there are two methods to calculate the accuracy of distributed mode. As shown in the above code, it seems that we calculate the accuracy twice, which wastes a lot of time. Therefore, I wonder how to save time in calculating the distributed metrics if we want to log and obtain the value of the metrics.

carmocca · 2021-08-31T01:08:06Z

carmocca
Aug 31, 2021

You are right that this is duplicating the calculation. Instead, try logging the Metric directly:

    def training_step(self, batch: Any, batch_idx: int):
        ...
        self.train_accuracy(predictions, targets)
        self.log('train/acc', self.train_accuracy, on_step=True, on_epoch=True)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Does `self.log(..., on_epoch=True)` cost the same time as `torchmetrics.Accuracy` in distributed mode? #9172

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Does self.log(..., on_epoch=True) cost the same time as torchmetrics.Accuracy in distributed mode? #9172

Uh oh!

marsggbo Aug 28, 2021

Replies: 1 comment

Uh oh!

carmocca Aug 31, 2021

Does `self.log(..., on_epoch=True)` cost the same time as `torchmetrics.Accuracy` in distributed mode? #9172

marsggbo
Aug 28, 2021

carmocca
Aug 31, 2021