System Info
transformers 5.9.0
peft 0.19.1
accelerate 1.13.0
torch 2.9.1 (CUDA 12.9)
- Python 3.12
- Platform: SageMaker training job, 2× p5.48xlarge (16× H100 80GB), FSDP
full_shard, attn_implementation="eager"
- Model:
deepseek-ai/DeepSeek-V4-Flash (284B MoE, published FP8-only — quantization_config.quant_method == "fp8", e4m3, 128×128 weight blocks)
Who can help?
@SunMarc @MekkCyber (quantization), @BenjaminBossan (PEFT)
Description
Fine-tuning an FP8-quantized checkpoint with LoRA fails at Trainer.__init__:
ValueError: The model you are trying to fine-tune is quantized with QuantizationMethod.FP8
but that quantization method do not support training. Please open an issue on GitHub ...
The error message and the docs both say the supported route for a quantized base model is to attach trainable LoRA adapters on top. We do exactly that via peft.get_peft_model(model, LoraConfig(...)) before constructing the Trainer, yet the guard still raises.
Root cause (transformers 5.9.0, trainer_utils.py::validate_quantization_for_training):
_is_quantized_and_base_model = getattr(model, "is_quantized", False) and not getattr(
model, "_hf_peft_config_loaded", False
)
...
elif _is_quantized_and_base_model and not _quantization_method_supports_training:
raise ValueError(... "do not support training" ...)
get_peft_model wraps the model in a PeftModel, but does not set _hf_peft_config_loaded on the (inner) base model (confirmed: that attribute is never assigned anywhere in peft; it is only set by transformers' native PeftAdapterMixin.add_adapter/load_adapter, integrations/peft.py).
PeftModel.__getattr__ proxies is_quantized to the FP8 base → True.
- So
_is_quantized_and_base_model stays True, and because the FP8 quantizer reports is_trainable == False, the elif branch raises — even though a LoRA adapter is attached.
Note the earlier if branch does exempt PEFT models (... and not _is_peft_model(model) ...), but the elif (quant-method-not-trainable) has no _is_peft_model / _hf_peft_config_loaded exemption when the wrapper came from get_peft_model. As a result there is no documented way to LoRA-fine-tune an FP8 base via the standard get_peft_model flow.
Reproduction
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments
from peft import LoraConfig, get_peft_model
from datasets import Dataset
model_id = "deepseek-ai/DeepSeek-V4-Flash" # any FP8 checkpoint, e.g. an FP8 DeepSeek-V3.1
model = AutoModelForCausalLM.from_pretrained(
model_id, torch_dtype=torch.bfloat16, attn_implementation="eager", trust_remote_code=True
)
tok = AutoTokenizer.from_pretrained(model_id)
# Attach LoRA exactly as the error message / docs recommend
model = get_peft_model(model, LoraConfig(r=16, lora_alpha=32, task_type="CAUSAL_LM"))
ds = Dataset.from_dict({"text": ["hello world"]}).map(lambda x: tok(x["text"]), remove_columns=["text"])
Trainer(model=model, args=TrainingArguments(output_dir="/tmp/x"), train_dataset=ds)
# -> ValueError: ... QuantizationMethod.FP8 ... do not support training
Expected behavior
One of the following:
- A
get_peft_model-wrapped model (_is_peft_model(model) is True) should bypass the elif "quant-method-not-trainable" branch, since the frozen FP8 base is never updated — only the bf16 LoRA adapters are. (The first if branch already exempts PEFT models; the elif arguably should too.)
- Or: clear, official documentation that FP8 checkpoints cannot be LoRA-fine-tuned at all, and that
model.add_adapter(...) (which sets _hf_peft_config_loaded) is/ isn't a supported substitute — including whether autograd actually backpropagates through FP8 block-quantized linears to the adapters.
Open questions for maintainers
Related issues
System Info
transformers5.9.0peft0.19.1accelerate1.13.0torch2.9.1 (CUDA 12.9)full_shard,attn_implementation="eager"deepseek-ai/DeepSeek-V4-Flash(284B MoE, published FP8-only —quantization_config.quant_method == "fp8", e4m3, 128×128 weight blocks)Who can help?
@SunMarc @MekkCyber (quantization), @BenjaminBossan (PEFT)
Description
Fine-tuning an FP8-quantized checkpoint with LoRA fails at
Trainer.__init__:The error message and the docs both say the supported route for a quantized base model is to attach trainable LoRA adapters on top. We do exactly that via
peft.get_peft_model(model, LoraConfig(...))before constructing theTrainer, yet the guard still raises.Root cause (transformers 5.9.0,
trainer_utils.py::validate_quantization_for_training):get_peft_modelwraps the model in aPeftModel, but does not set_hf_peft_config_loadedon the (inner) base model (confirmed: that attribute is never assigned anywhere inpeft; it is only set by transformers' nativePeftAdapterMixin.add_adapter/load_adapter,integrations/peft.py).PeftModel.__getattr__proxiesis_quantizedto the FP8 base →True._is_quantized_and_base_modelstaysTrue, and because the FP8 quantizer reportsis_trainable == False, theelifbranch raises — even though a LoRA adapter is attached.Note the earlier
ifbranch does exempt PEFT models (... and not _is_peft_model(model) ...), but theelif(quant-method-not-trainable) has no_is_peft_model/_hf_peft_config_loadedexemption when the wrapper came fromget_peft_model. As a result there is no documented way to LoRA-fine-tune an FP8 base via the standardget_peft_modelflow.Reproduction
Expected behavior
One of the following:
get_peft_model-wrapped model (_is_peft_model(model) is True) should bypass theelif"quant-method-not-trainable" branch, since the frozen FP8 base is never updated — only the bf16 LoRA adapters are. (The firstifbranch already exempts PEFT models; theelifarguably should too.)model.add_adapter(...)(which sets_hf_peft_config_loaded) is/ isn't a supported substitute — including whether autograd actually backpropagates through FP8 block-quantized linears to the adapters.Open questions for maintainers
model.add_adapter(lora_config)(native API, sets_hf_peft_config_loaded=True) the intended supported path for LoRA-on-FP8, or does it merely bypass the guard while the backward pass through FP8 linears is still unsupported?get_peft_modelflow appears to be the blocker.Related issues