Large Model Initialization with FSDP #20658
Unanswered
jthomy
asked this question in
DDP / multi-GPU / multi-node
Replies: 1 comment
-
related: |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
With strategies like FSDP: If I have a model that is too large to fit into CPU memory (especically not to fit n_gpus many times into memory), just instantiating in the configure_model() hook will run out of memory.
What is lightning's intended way to initialize the model such that I only load the full model once overall? Optimally I would do this directly on the GPU to have faster startup times. I would load the model from a non-distributed checkpoint, but happy to head any suggestions.
Beta Was this translation helpful? Give feedback.
All reactions