M4AFP8 checkpoint loading error

### System Info

8xH200

### Who can help?

_No response_

### Information

- [x] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

download: https://huggingface.co/Barrrrry/DeepSeek-R1-W4AFP8/tree/main
run: `trtllm-serve deepseek-W4AFP8/ --backend pytorch --tp_size=8 --ep_size=8 --kv_cache_free_gpu_memory_fraction 0.45 --max_batch_size 512 --max_num_tokens 9000 --extra_lllm_api_options=extra-llm-api-config.json --host=localhost --port=54321`

### Expected behavior

It works

### actual behavior

```
Loading weights:  98%|█████████▊| 1764/1802 [01:59<00:02, 14.75it/s]                                                                                                                                                                      
[06/11/2025-07:16:05] [TRT-LLM] [RANK 4] [E] Failed to initialize executor on rank 4: 'Linear' object has no attribute 'weight_scale'                                                                                                     
[06/11/2025-07:16:05] [TRT-LLM] [RANK 4] [E] Traceback (most recent call last):                                                                                                                                                           
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/worker.py", line 722, in worker_main
    worker: GenerationExecutorWorker = worker_cls(
                                       ^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/worker.py", line 139, in __init__
    self.engine = _create_engine()
                  ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/worker.py", line 137, in _create_engine
    return create_executor(**args)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor_creator.py", line 207, in create_py_executor
    model_engine = PyTorchModelEngine(
                   ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/model_engine.py", line 362, in __init__
    self.model = self._load_model(
                 ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/model_engine.py", line 1055, in _load_model
    model.load_weights(weights)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/models/modeling_deepseekv3.py", line 1367, in load_weights
    module.weight_scale.data.copy_(fused_a_scale)
    ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1940, in __getattr__
    raise AttributeError(
AttributeError: 'Linear' object has no attribute 'weight_scale'

```

### additional notes
- Running from docker built on commit `e2863a3159b5fb306c695f7519c45616cc892018`
- extra-llm-api-config.yml:
```
enable_attention_dp: true
speculative_config:
  decoding_type: MTP
  num_nextn_predict_layers: 3
```
- as I found out, this happens only when MPT is enabled

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

M4AFP8 checkpoint loading error #5120

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

M4AFP8 checkpoint loading error #5120

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions