Training a model in a callback with its own dataset and DDP #11109
Unanswered
ant0nsc
asked this question in
DDP / multi-GPU / multi-node
Replies: 1 comment 1 reply
-
I personally would not recommend Lightning Callbacks for this use case. Callbacks are meant for non-essential code. I'd recommend keeping all of your modeling code inside of your LightningModule implementation |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi everyone, we are using self-supervised learning to build representations, and evaluate the quality of the embedding in callbacks on a supervised task - building a linear model on top of the embedding (so far, so standard).
We are now considering switching to more and more such callbacks, each using different datasets. This brings an increased burden of managing datasets - the main encoder training needs to know about all the datasets that each callback needs. We'd rather have a clear separation of concerns, that each callbacks handles the data that it needs.
If we train each callback separately, each callback needs to iterate over its own dataset. That's easy in a single GPU setup - but we have no idea how to handle that with multiple GPUs. What are all the tricks that Lightning applies under the hood to distribute data correctly across all DDP process? How can we replicate that correctly?
Also, does anyone have other thoughts/view on such a setup?
Thanks for any input!
Beta Was this translation helpful? Give feedback.
All reactions