Skip to content

[Bug]: Qwen3.5-4B on NPU returns "!" as the first generated token on xLLM 0.9.0 #1185

@xpluspro

Description

@xpluspro

Your environment

Environment:

  • Hardware: 910B3
  • CANN: 8.5
  • Docker: official xLLM docker environment, consistent with the provided setup
  • xLLM version: 0.9.0

🐛 Describe the bug

According to RELEASE.md, xLLM 0.9.0 supports Qwen3.5/Qwen3.5-MoE on NPU.

However, in my environment, Qwen3.5-4B consistently generates "!" as the first output token, while Qwen3 works correctly with the same serving setup and request pattern.

Model config highlights:

  • architectures: Qwen3_5ForConditionalGeneration
  • model_type: qwen3_5
  • text_config.model_type: qwen3_5_text
  • full_attention_interval: 4
  • layer_types present (hybrid linear/full attention)

Minimal reproduction:

curl http://localhost:9977/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen3.5-4B",
    "messages": [{"role":"user","content":"你好"}],
    "temperature": 0,
    "max_tokens": 1
  }'

Observed result:

{"id":"chatcmpl-10922618569676071768-AbnVBNKbWvYDFz7FCZ89Df","object":"chat.completion","created":1775279394,"model":"Qwen3.5-4B","choices":[{"index":0,"message":{"role":"assistant","content":"!"},"tool_calls":[],"finish_reason":"length"}],....

Additional notes:

  • Qwen3 works correctly under the same command pattern.
  • I also tested code including the Qwen3.5 support chain up to #1160, but the issue remained.
  • I checked #1171, but it is still draft and not yet rebased after #1160.

Questions:

  1. Is the public checkpoint Qwen/Qwen3.5-4B expected to work on NPU with xLLM 0.9.0?
  2. Is there a rebased patch after #1160 / #1171 for this issue?
  3. Is there any required model conversion or special launch flag for Qwen3.5 text-only inference on xLLM?

Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions