Skip to content

Current best practices to initialize massive (50B parameter+) models #16944

Discussion options

You must be logged in to vote

Hi! The init_meta_context functionality was replaced with a torchdistx integration in #13868. You can do the following:

from torchdistx.deferred_init import deferred_init

model = deferred_init(YourLightningModule)

And we'll materialize it for you in the Trainer. This is very experimental, and you might encounter installation issues.

In the long term, we'll adopt the fake tensor mode from PyTorch: #16448.

Otherwise, for a stable(r) solution, you can use the DeepSpeed integration: https://pytorch-lightning.readthedocs.io/en/stable/advanced/model_parallel.html#deepspeed-zero-stage-3

Replies: 3 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Answer selected by azton
Comment options

You must be logged in to vote
1 reply
@carmocca
Comment options

Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants