Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM with batch size 1 when with ViT-bigG on 40GB GPU #296

Closed
mitchellnw opened this issue Dec 15, 2022 · 5 comments
Closed

OOM with batch size 1 when with ViT-bigG on 40GB GPU #296

mitchellnw opened this issue Dec 15, 2022 · 5 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@mitchellnw
Copy link
Contributor

Similarly to #261, getting OOM with batch size 1 on 40GB GPU with ViT-G.

@mitchellnw mitchellnw added bug Something isn't working help wanted Extra attention is needed labels Dec 15, 2022
@OrangeSodahub
Copy link
Contributor

Weird. I once tested ViT-g-14 on RTX3090 (10G) and it could work, could refer to this, Maybe you could try multiple machines.

@mitchellnw mitchellnw changed the title OOM with batch size 1 when with ViT-G on 40GB GPU OOM with batch size 1 when with ViT-bigG on 40GB GPU Dec 15, 2022
@mitchellnw
Copy link
Contributor Author

sorry I mean bigG not g

@OrangeSodahub
Copy link
Contributor

Sorry for misunderstand

@rwightman
Copy link
Collaborator

I think we've got two 'easy' options right now, DeepSpeed Zero (PR for this #264 might be worth testing) or PyTorch native FSDP. Talking w/ someone close to TPUs & PyTorch XLA recently, and they were stronly recommending giving FSDP a try for large scale runs (there's both an XLA specific varaint and normal PyTorch one).

Going full tensor parallelism is more work and I feel things are about to change w/ upcoming native PyTorch features (compilation w/ annotations for parallelism) such that needing to do it Megatron style will be a thing of the past.

@mitchellnw
Copy link
Contributor Author

seems like progress is being made with FSDP and also we think the OOM was because of model size + activations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants