-
Notifications
You must be signed in to change notification settings - Fork 167
[Bug]: Qwen3.5-4B on NPU returns "!" as the first generated token on xLLM 0.9.0 #1185
Copy link
Copy link
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Your environment
Environment:
- Hardware: 910B3
- CANN: 8.5
- Docker: official xLLM docker environment, consistent with the provided setup
- xLLM version: 0.9.0
🐛 Describe the bug
According to RELEASE.md, xLLM 0.9.0 supports Qwen3.5/Qwen3.5-MoE on NPU.
However, in my environment, Qwen3.5-4B consistently generates "!" as the first output token, while Qwen3 works correctly with the same serving setup and request pattern.
Model config highlights:
architectures:Qwen3_5ForConditionalGenerationmodel_type:qwen3_5text_config.model_type:qwen3_5_textfull_attention_interval:4layer_typespresent (hybrid linear/full attention)
Minimal reproduction:
curl http://localhost:9977/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen3.5-4B",
"messages": [{"role":"user","content":"你好"}],
"temperature": 0,
"max_tokens": 1
}'Observed result:
{"id":"chatcmpl-10922618569676071768-AbnVBNKbWvYDFz7FCZ89Df","object":"chat.completion","created":1775279394,"model":"Qwen3.5-4B","choices":[{"index":0,"message":{"role":"assistant","content":"!"},"tool_calls":[],"finish_reason":"length"}],....Additional notes:
Qwen3works correctly under the same command pattern.- I also tested code including the Qwen3.5 support chain up to
#1160, but the issue remained. - I checked
#1171, but it is still draft and not yet rebased after#1160.
Questions:
- Is the public checkpoint
Qwen/Qwen3.5-4Bexpected to work on NPU with xLLM 0.9.0? - Is there a rebased patch after
#1160/#1171for this issue? - Is there any required model conversion or special launch flag for Qwen3.5 text-only inference on xLLM?
Thanks.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working