[Bug]: Qwen3.5-4B on NPU returns "!" as the first generated token on xLLM 0.9.0

### Your environment

Environment:

* Hardware: 910B3
* CANN: 8.5
* Docker: official xLLM docker environment, consistent with the provided setup
* xLLM version: 0.9.0



### 🐛 Describe the bug

According to `RELEASE.md`, xLLM 0.9.0 supports `Qwen3.5/Qwen3.5-MoE` on NPU.

However, in my environment, `Qwen3.5-4B` consistently generates `"!"` as the first output token, while `Qwen3` works correctly with the same serving setup and request pattern.

Model config highlights:

* `architectures`: `Qwen3_5ForConditionalGeneration`
* `model_type`: `qwen3_5`
* `text_config.model_type`: `qwen3_5_text`
* `full_attention_interval`: `4`
* `layer_types` present (hybrid linear/full attention)

Minimal reproduction:

```bash
curl http://localhost:9977/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen3.5-4B",
    "messages": [{"role":"user","content":"你好"}],
    "temperature": 0,
    "max_tokens": 1
  }'
```

Observed result:

```json
{"id":"chatcmpl-10922618569676071768-AbnVBNKbWvYDFz7FCZ89Df","object":"chat.completion","created":1775279394,"model":"Qwen3.5-4B","choices":[{"index":0,"message":{"role":"assistant","content":"!"},"tool_calls":[],"finish_reason":"length"}],....
```

Additional notes:

* `Qwen3` works correctly under the same command pattern.
* I also tested code including the Qwen3.5 support chain up to `#1160`, but the issue remained.
* I checked `#1171`, but it is still draft and not yet rebased after `#1160`.

Questions:

1. Is the public checkpoint `Qwen/Qwen3.5-4B` expected to work on NPU with xLLM 0.9.0?
2. Is there a rebased patch after `#1160` / `#1171` for this issue?
3. Is there any required model conversion or special launch flag for Qwen3.5 text-only inference on xLLM?

Thanks.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Qwen3.5-4B on NPU returns "!" as the first generated token on xLLM 0.9.0 #1185

Your environment

🐛 Describe the bug

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: Qwen3.5-4B on NPU returns "!" as the first generated token on xLLM 0.9.0 #1185

Description

Your environment

🐛 Describe the bug

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions