Fix dangling think before tool calls in templates by Thump604 · Pull Request #494 · waybarrios/vllm-mlx

Thump604 · 2026-05-02T13:58:10Z

Summary

Fixes #493.

This adds a small pre-template message normalization guard for assistant history where a thinking block is left open before raw tool XML:

<think>
reasoning...
<tool_call>...</tool_call>

Before apply_chat_template, the normalizer now inserts </think> before the first <tool_call> that follows a dangling <think>. That keeps tool XML outside the reasoning span for both bundled and user-supplied templates.

Credit to Cheuk-Yiu Chan for identifying and documenting the Qwen 3.6 template failure mode and template-side repair:

Why

Qwen 3.6 can sustain interleaved thinking across turns. If malformed assistant history leaves <think> open before <tool_call>, the next rendered prompt can condition the model as though the tool call is still reasoning. That can lead to ignored tool calls, reasoning leakage into tool turns, or polluted preserved history.

This patch handles the structural transcript hygiene issue before template rendering, rather than relying on a specific template fork.

Changes

Add vllm_mlx.engine.chat_template_safety.normalize_messages_for_chat_template.
Preserve the existing OpenAI tool_calls[].function.arguments mapping normalization.
Apply the normalization in BatchedEngine and SimpleEngine before apply_chat_template.
Add a unit test covering dangling <think> before raw <tool_call> without mutating caller messages.

Validation

python -m pytest tests/test_batched_engine.py -q -> 11 passed
AI_RUNTIME_BYPASS_SAFETY_GATE=1 /opt/ai-runtime/venv-live/bin/python -m pytest tests/test_batched_engine.py tests/test_chat_template_kwargs.py -q -> 20 passed

janhilgard

Good fix for a real Qwen 3.6 failure mode. Pre-template normalization is the right approach — it handles the structural hygiene issue without forking the template.

_close_dangling_think_before_tool_call — logic is correct:

Fast-path early return when no <think> or <tool_call> present (majority of messages)
rfind("<think>") vs rfind("</think>") correctly detects the last unclosed think block
Handles multiple think sections (e.g., <think>a</think> text <think>b<tool_call>...) correctly — the second unclosed think gets closed
Only operates on role: "assistant" messages, so user content with these tags is untouched

_message_to_dict — nice improvement over the pure json.dumps(default=str) path. model_dump(exclude_none=True) preserves Pydantic objects properly. The v is not None comprehension is redundant with exclude_none=True but harmless as a defensive guard.

SimpleEngine integration — this PR also fixes a pre-existing gap: SimpleEngine's chat() and _stream_generate_text() previously didn't normalize tool-call arguments at all. Now both engines share the same normalization path.

Module structure — extracting to chat_template_safety.py is clean. The old _normalize_tool_call_arguments_for_template in batched.py is now a trivial passthrough — could be inlined later, but that's a minor cleanup.

Tests cover the key scenarios: dangling think insertion, original message immutability, and Pydantic-style message handling.

Single commit, CI green. LGTM.

Fix dangling think before tool calls in templates

fd21de8

Thump604 force-pushed the codex/qwen36-template-tool-heal branch from d4b0dab to fd21de8 Compare May 2, 2026 14:04

Thump604 marked this pull request as ready for review May 2, 2026 14:06

janhilgard approved these changes May 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix dangling think before tool calls in templates#494

Fix dangling think before tool calls in templates#494
Thump604 wants to merge 1 commit intowaybarrios:mainfrom
Thump604:codex/qwen36-template-tool-heal

Thump604 commented May 2, 2026

Uh oh!

janhilgard left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Thump604 commented May 2, 2026

Summary

Why

Changes

Validation

Uh oh!

janhilgard left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants