Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HF Accelerate FP8 use more gpu memory then FP16 in training LLM #1429

Open
Liufeiran123 opened this issue Jan 28, 2025 · 1 comment
Open

Comments

@Liufeiran123
Copy link

Liufeiran123 commented Jan 28, 2025

HF Accelerate FP8 use more gpu memory then FP16 when trianing LLM
torch == 2.5.1
accelerate == 1.3.0
transformers == 4.45.0
transformer-engine == 1.13

@pggPL
Copy link
Collaborator

pggPL commented Jan 28, 2025

FP8 training stores weights (and gradients, optimizer state etc.) in higher precision and casts them to FP8 before each gemm. It is faster, but it consumes more memory. So it is expected behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants