Training a model in a callback with its own dataset and DDP #11109

ant0nsc · 2021-12-16T16:27:21Z

ant0nsc
Dec 16, 2021

Hi everyone, we are using self-supervised learning to build representations, and evaluate the quality of the embedding in callbacks on a supervised task - building a linear model on top of the embedding (so far, so standard).
We are now considering switching to more and more such callbacks, each using different datasets. This brings an increased burden of managing datasets - the main encoder training needs to know about all the datasets that each callback needs. We'd rather have a clear separation of concerns, that each callbacks handles the data that it needs.

If we train each callback separately, each callback needs to iterate over its own dataset. That's easy in a single GPU setup - but we have no idea how to handle that with multiple GPUs. What are all the tricks that Lightning applies under the hood to distribute data correctly across all DDP process? How can we replicate that correctly?

Also, does anyone have other thoughts/view on such a setup?

Thanks for any input!

ananthsub · 2021-12-16T19:13:46Z

ananthsub
Dec 16, 2021

I personally would not recommend Lightning Callbacks for this use case. Callbacks are meant for non-essential code. I'd recommend keeping all of your modeling code inside of your LightningModule implementation

1 reply

maxilse Dec 17, 2021

Hi, thanks for your reply. I just wanted to point out that the ssl evaluator in bolts is a Callback (https://github.com/PyTorchLightning/lightning-bolts/blob/master/pl_bolts/callbacks/ssl_online.py). However it currently relies on the datasets for the linear head/evaluator to be of a rather strict format, essentially the same as the encoder dataset. Since we are very happy with the SSLContainer we would prefer not to rewrite it. Therefore, we were now wondering what would be the best way of adding a dataset to the evaluator callback for more flexibility.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Training a model in a callback with its own dataset and DDP #11109

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Training a model in a callback with its own dataset and DDP #11109

Uh oh!

ant0nsc Dec 16, 2021

Replies: 1 comment · 1 reply

Uh oh!

ananthsub Dec 16, 2021

Uh oh!

maxilse Dec 17, 2021

ant0nsc
Dec 16, 2021

Replies: 1 comment 1 reply

ananthsub
Dec 16, 2021