-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OOM with batch size 1 when with ViT-bigG on 40GB GPU #296
Comments
Weird. I once tested ViT-g-14 on RTX3090 (10G) and it could work, could refer to this, Maybe you could try multiple machines. |
sorry I mean |
Sorry for misunderstand |
I think we've got two 'easy' options right now, DeepSpeed Zero (PR for this #264 might be worth testing) or PyTorch native FSDP. Talking w/ someone close to TPUs & PyTorch XLA recently, and they were stronly recommending giving FSDP a try for large scale runs (there's both an XLA specific varaint and normal PyTorch one). Going full tensor parallelism is more work and I feel things are about to change w/ upcoming native PyTorch features (compilation w/ annotations for parallelism) such that needing to do it Megatron style will be a thing of the past. |
seems like progress is being made with FSDP and also we think the OOM was because of model size + activations |
Similarly to #261, getting OOM with batch size 1 on 40GB GPU with ViT-G.
The text was updated successfully, but these errors were encountered: