Skip to content

Fix dangling think before tool calls in templates#494

Open
Thump604 wants to merge 1 commit intowaybarrios:mainfrom
Thump604:codex/qwen36-template-tool-heal
Open

Fix dangling think before tool calls in templates#494
Thump604 wants to merge 1 commit intowaybarrios:mainfrom
Thump604:codex/qwen36-template-tool-heal

Conversation

@Thump604
Copy link
Copy Markdown
Collaborator

@Thump604 Thump604 commented May 2, 2026

Summary

Fixes #493.

This adds a small pre-template message normalization guard for assistant history where a thinking block is left open before raw tool XML:

<think>
reasoning...
<tool_call>...</tool_call>

Before apply_chat_template, the normalizer now inserts </think> before the first <tool_call> that follows a dangling <think>. That keeps tool XML outside the reasoning span for both bundled and user-supplied templates.

Credit to Cheuk-Yiu Chan for identifying and documenting the Qwen 3.6 template failure mode and template-side repair:

Why

Qwen 3.6 can sustain interleaved thinking across turns. If malformed assistant history leaves <think> open before <tool_call>, the next rendered prompt can condition the model as though the tool call is still reasoning. That can lead to ignored tool calls, reasoning leakage into tool turns, or polluted preserved history.

This patch handles the structural transcript hygiene issue before template rendering, rather than relying on a specific template fork.

Changes

  • Add vllm_mlx.engine.chat_template_safety.normalize_messages_for_chat_template.
  • Preserve the existing OpenAI tool_calls[].function.arguments mapping normalization.
  • Apply the normalization in BatchedEngine and SimpleEngine before apply_chat_template.
  • Add a unit test covering dangling <think> before raw <tool_call> without mutating caller messages.

Validation

  • python -m pytest tests/test_batched_engine.py -q -> 11 passed
  • AI_RUNTIME_BYPASS_SAFETY_GATE=1 /opt/ai-runtime/venv-live/bin/python -m pytest tests/test_batched_engine.py tests/test_chat_template_kwargs.py -q -> 20 passed

@Thump604 Thump604 force-pushed the codex/qwen36-template-tool-heal branch from d4b0dab to fd21de8 Compare May 2, 2026 14:04
@Thump604 Thump604 marked this pull request as ready for review May 2, 2026 14:06
Copy link
Copy Markdown
Collaborator

@janhilgard janhilgard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix for a real Qwen 3.6 failure mode. Pre-template normalization is the right approach — it handles the structural hygiene issue without forking the template.

_close_dangling_think_before_tool_call — logic is correct:

  • Fast-path early return when no <think> or <tool_call> present (majority of messages)
  • rfind("<think>") vs rfind("</think>") correctly detects the last unclosed think block
  • Handles multiple think sections (e.g., <think>a</think> text <think>b<tool_call>...) correctly — the second unclosed think gets closed
  • Only operates on role: "assistant" messages, so user content with these tags is untouched

_message_to_dict — nice improvement over the pure json.dumps(default=str) path. model_dump(exclude_none=True) preserves Pydantic objects properly. The v is not None comprehension is redundant with exclude_none=True but harmless as a defensive guard.

SimpleEngine integration — this PR also fixes a pre-existing gap: SimpleEngine's chat() and _stream_generate_text() previously didn't normalize tool-call arguments at all. Now both engines share the same normalization path.

Module structure — extracting to chat_template_safety.py is clean. The old _normalize_tool_call_arguments_for_template in batched.py is now a trivial passthrough — could be inlined later, but that's a minor cleanup.

Tests cover the key scenarios: dangling think insertion, original message immutability, and Pydantic-style message handling.

Single commit, CI green. LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Qwen 3.6: prevent dangling thinking from trapping tool calls in chat-template history

2 participants