Skip to content

M4AFP8 checkpoint loading error #5120

Open
@Wokzy

Description

@Wokzy

System Info

8xH200

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

download: https://huggingface.co/Barrrrry/DeepSeek-R1-W4AFP8/tree/main
run: trtllm-serve deepseek-W4AFP8/ --backend pytorch --tp_size=8 --ep_size=8 --kv_cache_free_gpu_memory_fraction 0.45 --max_batch_size 512 --max_num_tokens 9000 --extra_lllm_api_options=extra-llm-api-config.json --host=localhost --port=54321

Expected behavior

It works

actual behavior

Loading weights:  98%|█████████▊| 1764/1802 [01:59<00:02, 14.75it/s]                                                                                                                                                                      
[06/11/2025-07:16:05] [TRT-LLM] [RANK 4] [E] Failed to initialize executor on rank 4: 'Linear' object has no attribute 'weight_scale'                                                                                                     
[06/11/2025-07:16:05] [TRT-LLM] [RANK 4] [E] Traceback (most recent call last):                                                                                                                                                           
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/worker.py", line 722, in worker_main
    worker: GenerationExecutorWorker = worker_cls(
                                       ^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/worker.py", line 139, in __init__
    self.engine = _create_engine()
                  ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/worker.py", line 137, in _create_engine
    return create_executor(**args)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor_creator.py", line 207, in create_py_executor
    model_engine = PyTorchModelEngine(
                   ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/model_engine.py", line 362, in __init__
    self.model = self._load_model(
                 ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/model_engine.py", line 1055, in _load_model
    model.load_weights(weights)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/models/modeling_deepseekv3.py", line 1367, in load_weights
    module.weight_scale.data.copy_(fused_a_scale)
    ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1940, in __getattr__
    raise AttributeError(
AttributeError: 'Linear' object has no attribute 'weight_scale'

additional notes

  • Running from docker built on commit e2863a3159b5fb306c695f7519c45616cc892018
  • extra-llm-api-config.yml:
enable_attention_dp: true
speculative_config:
  decoding_type: MTP
  num_nextn_predict_layers: 3
  • as I found out, this happens only when MPT is enabled

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions