HF Accelerate FP8 use more gpu memory then FP16 in training LLM #1429

Liufeiran123 · 2025-01-28T10:05:03Z

HF Accelerate FP8 use more gpu memory then FP16 when trianing LLM
torch == 2.5.1
accelerate == 1.3.0
transformers == 4.45.0
transformer-engine == 1.13

pggPL · 2025-01-28T11:56:54Z

FP8 training stores weights (and gradients, optimizer state etc.) in higher precision and casts them to FP8 before each gemm. It is faster, but it consumes more memory. So it is expected behavior.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HF Accelerate FP8 use more gpu memory then FP16 in training LLM #1429

HF Accelerate FP8 use more gpu memory then FP16 in training LLM #1429

Liufeiran123 commented Jan 28, 2025 •

edited

Loading

pggPL commented Jan 28, 2025

HF Accelerate FP8 use more gpu memory then FP16 in training LLM #1429

HF Accelerate FP8 use more gpu memory then FP16 in training LLM #1429

Comments

Liufeiran123 commented Jan 28, 2025 • edited Loading

pggPL commented Jan 28, 2025

Liufeiran123 commented Jan 28, 2025 •

edited

Loading