Skip to content

LoRA fine-tuning of an FP8 checkpoint is blocked: get_peft_model does not clear validate_quantization_for_training (QuantizationMethod.FP8) #46736

Description

@brunopistone

System Info

  • transformers 5.9.0
  • peft 0.19.1
  • accelerate 1.13.0
  • torch 2.9.1 (CUDA 12.9)
  • Python 3.12
  • Platform: SageMaker training job, 2× p5.48xlarge (16× H100 80GB), FSDP full_shard, attn_implementation="eager"
  • Model: deepseek-ai/DeepSeek-V4-Flash (284B MoE, published FP8-onlyquantization_config.quant_method == "fp8", e4m3, 128×128 weight blocks)

Who can help?

@SunMarc @MekkCyber (quantization), @BenjaminBossan (PEFT)

Description

Fine-tuning an FP8-quantized checkpoint with LoRA fails at Trainer.__init__:

ValueError: The model you are trying to fine-tune is quantized with QuantizationMethod.FP8
but that quantization method do not support training. Please open an issue on GitHub ...

The error message and the docs both say the supported route for a quantized base model is to attach trainable LoRA adapters on top. We do exactly that via peft.get_peft_model(model, LoraConfig(...)) before constructing the Trainer, yet the guard still raises.

Root cause (transformers 5.9.0, trainer_utils.py::validate_quantization_for_training):

_is_quantized_and_base_model = getattr(model, "is_quantized", False) and not getattr(
    model, "_hf_peft_config_loaded", False
)
...
elif _is_quantized_and_base_model and not _quantization_method_supports_training:
    raise ValueError(... "do not support training" ...)
  • get_peft_model wraps the model in a PeftModel, but does not set _hf_peft_config_loaded on the (inner) base model (confirmed: that attribute is never assigned anywhere in peft; it is only set by transformers' native PeftAdapterMixin.add_adapter/load_adapter, integrations/peft.py).
  • PeftModel.__getattr__ proxies is_quantized to the FP8 base → True.
  • So _is_quantized_and_base_model stays True, and because the FP8 quantizer reports is_trainable == False, the elif branch raises — even though a LoRA adapter is attached.

Note the earlier if branch does exempt PEFT models (... and not _is_peft_model(model) ...), but the elif (quant-method-not-trainable) has no _is_peft_model / _hf_peft_config_loaded exemption when the wrapper came from get_peft_model. As a result there is no documented way to LoRA-fine-tune an FP8 base via the standard get_peft_model flow.

Reproduction

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments
from peft import LoraConfig, get_peft_model
from datasets import Dataset

model_id = "deepseek-ai/DeepSeek-V4-Flash"   # any FP8 checkpoint, e.g. an FP8 DeepSeek-V3.1
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, attn_implementation="eager", trust_remote_code=True
)
tok = AutoTokenizer.from_pretrained(model_id)

# Attach LoRA exactly as the error message / docs recommend
model = get_peft_model(model, LoraConfig(r=16, lora_alpha=32, task_type="CAUSAL_LM"))

ds = Dataset.from_dict({"text": ["hello world"]}).map(lambda x: tok(x["text"]), remove_columns=["text"])
Trainer(model=model, args=TrainingArguments(output_dir="/tmp/x"), train_dataset=ds)
# -> ValueError: ... QuantizationMethod.FP8 ... do not support training

Expected behavior

One of the following:

  1. A get_peft_model-wrapped model (_is_peft_model(model) is True) should bypass the elif "quant-method-not-trainable" branch, since the frozen FP8 base is never updated — only the bf16 LoRA adapters are. (The first if branch already exempts PEFT models; the elif arguably should too.)
  2. Or: clear, official documentation that FP8 checkpoints cannot be LoRA-fine-tuned at all, and that model.add_adapter(...) (which sets _hf_peft_config_loaded) is/ isn't a supported substitute — including whether autograd actually backpropagates through FP8 block-quantized linears to the adapters.

Open questions for maintainers

Related issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions