Open
Description
System Info
8xH200
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
download: https://huggingface.co/Barrrrry/DeepSeek-R1-W4AFP8/tree/main
run: trtllm-serve deepseek-W4AFP8/ --backend pytorch --tp_size=8 --ep_size=8 --kv_cache_free_gpu_memory_fraction 0.45 --max_batch_size 512 --max_num_tokens 9000 --extra_lllm_api_options=extra-llm-api-config.json --host=localhost --port=54321
Expected behavior
It works
actual behavior
Loading weights: 98%|█████████▊| 1764/1802 [01:59<00:02, 14.75it/s]
[06/11/2025-07:16:05] [TRT-LLM] [RANK 4] [E] Failed to initialize executor on rank 4: 'Linear' object has no attribute 'weight_scale'
[06/11/2025-07:16:05] [TRT-LLM] [RANK 4] [E] Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/worker.py", line 722, in worker_main
worker: GenerationExecutorWorker = worker_cls(
^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/worker.py", line 139, in __init__
self.engine = _create_engine()
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/executor/worker.py", line 137, in _create_engine
return create_executor(**args)
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor_creator.py", line 207, in create_py_executor
model_engine = PyTorchModelEngine(
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/model_engine.py", line 362, in __init__
self.model = self._load_model(
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/model_engine.py", line 1055, in _load_model
model.load_weights(weights)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/models/modeling_deepseekv3.py", line 1367, in load_weights
module.weight_scale.data.copy_(fused_a_scale)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1940, in __getattr__
raise AttributeError(
AttributeError: 'Linear' object has no attribute 'weight_scale'
additional notes
- Running from docker built on commit
e2863a3159b5fb306c695f7519c45616cc892018
- extra-llm-api-config.yml:
enable_attention_dp: true
speculative_config:
decoding_type: MTP
num_nextn_predict_layers: 3
- as I found out, this happens only when MPT is enabled