LoRA fine-tuning of an FP8 checkpoint is blocked: `get_peft_model` does not clear `validate_quantization_for_training` (QuantizationMethod.FP8)

### System Info

- `transformers` 5.9.0
- `peft` 0.19.1
- `accelerate` 1.13.0
- `torch` 2.9.1 (CUDA 12.9)
- Python 3.12
- Platform: SageMaker training job, 2× p5.48xlarge (16× H100 80GB), FSDP `full_shard`, `attn_implementation="eager"`
- Model: `deepseek-ai/DeepSeek-V4-Flash` (284B MoE, published **FP8-only** — `quantization_config.quant_method == "fp8"`, e4m3, 128×128 weight blocks)

### Who can help?

@SunMarc @MekkCyber (quantization), @BenjaminBossan (PEFT)

### Description

Fine-tuning an FP8-quantized checkpoint with LoRA fails at `Trainer.__init__`:

```
ValueError: The model you are trying to fine-tune is quantized with QuantizationMethod.FP8
but that quantization method do not support training. Please open an issue on GitHub ...
```

The error message and the docs both say the supported route for a quantized base model is to **attach trainable LoRA adapters** on top. We do exactly that via `peft.get_peft_model(model, LoraConfig(...))` **before** constructing the `Trainer`, yet the guard still raises.

Root cause (transformers 5.9.0, `trainer_utils.py::validate_quantization_for_training`):

```python
_is_quantized_and_base_model = getattr(model, "is_quantized", False) and not getattr(
    model, "_hf_peft_config_loaded", False
)
...
elif _is_quantized_and_base_model and not _quantization_method_supports_training:
    raise ValueError(... "do not support training" ...)
```

- `get_peft_model` wraps the model in a `PeftModel`, but **does not set `_hf_peft_config_loaded` on the (inner) base model** (confirmed: that attribute is never assigned anywhere in `peft`; it is only set by transformers' native `PeftAdapterMixin.add_adapter`/`load_adapter`, `integrations/peft.py`).
- `PeftModel.__getattr__` proxies `is_quantized` to the FP8 base → `True`.
- So `_is_quantized_and_base_model` stays `True`, and because the FP8 quantizer reports `is_trainable == False`, the **`elif` branch raises** — even though a LoRA adapter is attached.

Note the earlier `if` branch *does* exempt PEFT models (`... and not _is_peft_model(model) ...`), but the `elif` (quant-method-not-trainable) has **no `_is_peft_model` / `_hf_peft_config_loaded` exemption** when the wrapper came from `get_peft_model`. As a result there is **no documented way** to LoRA-fine-tune an FP8 base via the standard `get_peft_model` flow.

### Reproduction

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments
from peft import LoraConfig, get_peft_model
from datasets import Dataset

model_id = "deepseek-ai/DeepSeek-V4-Flash"   # any FP8 checkpoint, e.g. an FP8 DeepSeek-V3.1
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, attn_implementation="eager", trust_remote_code=True
)
tok = AutoTokenizer.from_pretrained(model_id)

# Attach LoRA exactly as the error message / docs recommend
model = get_peft_model(model, LoraConfig(r=16, lora_alpha=32, task_type="CAUSAL_LM"))

ds = Dataset.from_dict({"text": ["hello world"]}).map(lambda x: tok(x["text"]), remove_columns=["text"])
Trainer(model=model, args=TrainingArguments(output_dir="/tmp/x"), train_dataset=ds)
# -> ValueError: ... QuantizationMethod.FP8 ... do not support training
```

### Expected behavior

One of the following:

1. A `get_peft_model`-wrapped model (`_is_peft_model(model) is True`) should **bypass the `elif` "quant-method-not-trainable" branch**, since the frozen FP8 base is never updated — only the bf16 LoRA adapters are. (The first `if` branch already exempts PEFT models; the `elif` arguably should too.)
2. Or: clear, official documentation that FP8 checkpoints cannot be LoRA-fine-tuned at all, and that `model.add_adapter(...)` (which sets `_hf_peft_config_loaded`) is/ isn't a supported substitute — including whether autograd actually backpropagates through FP8 block-quantized linears to the adapters.

### Open questions for maintainers

- Is `model.add_adapter(lora_config)` (native API, sets `_hf_peft_config_loaded=True`) the intended supported path for LoRA-on-FP8, or does it merely bypass the guard while the backward pass through FP8 linears is still unsupported?
- Does this overlap with #46295 (Training Support for FP8) and the closed-without-resolution #41516 / #37927? If FP8 LoRA is intended to work, the `get_peft_model` flow appears to be the blocker.

### Related issues

- #46295 (open) — Training Support for FP8
- #41516 (closed, no resolution) — DeepSeek V3.1 LoRA → same FP8 ValueError
- #37927 (closed) — request FP8 training support
- #39410 (open) — FP8 training for MP/TP

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LoRA fine-tuning of an FP8 checkpoint is blocked: `get_peft_model` does not clear `validate_quantization_for_training` (QuantizationMethod.FP8) #46736

System Info

Who can help?

Description

Reproduction

Expected behavior

Open questions for maintainers

Related issues

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

LoRA fine-tuning of an FP8 checkpoint is blocked: get_peft_model does not clear validate_quantization_for_training (QuantizationMethod.FP8) #46736

Description

System Info

Who can help?

Description

Reproduction

Expected behavior

Open questions for maintainers

Related issues

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

LoRA fine-tuning of an FP8 checkpoint is blocked: `get_peft_model` does not clear `validate_quantization_for_training` (QuantizationMethod.FP8) #46736