Fix dangling think before tool calls in templates#494
Fix dangling think before tool calls in templates#494Thump604 wants to merge 1 commit intowaybarrios:mainfrom
Conversation
d4b0dab to
fd21de8
Compare
janhilgard
left a comment
There was a problem hiding this comment.
Good fix for a real Qwen 3.6 failure mode. Pre-template normalization is the right approach — it handles the structural hygiene issue without forking the template.
_close_dangling_think_before_tool_call — logic is correct:
- Fast-path early return when no
<think>or<tool_call>present (majority of messages) rfind("<think>")vsrfind("</think>")correctly detects the last unclosed think block- Handles multiple think sections (e.g.,
<think>a</think> text <think>b<tool_call>...) correctly — the second unclosed think gets closed - Only operates on
role: "assistant"messages, so user content with these tags is untouched
_message_to_dict — nice improvement over the pure json.dumps(default=str) path. model_dump(exclude_none=True) preserves Pydantic objects properly. The v is not None comprehension is redundant with exclude_none=True but harmless as a defensive guard.
SimpleEngine integration — this PR also fixes a pre-existing gap: SimpleEngine's chat() and _stream_generate_text() previously didn't normalize tool-call arguments at all. Now both engines share the same normalization path.
Module structure — extracting to chat_template_safety.py is clean. The old _normalize_tool_call_arguments_for_template in batched.py is now a trivial passthrough — could be inlined later, but that's a minor cleanup.
Tests cover the key scenarios: dangling think insertion, original message immutability, and Pydantic-style message handling.
Single commit, CI green. LGTM.
Summary
Fixes #493.
This adds a small pre-template message normalization guard for assistant history where a thinking block is left open before raw tool XML:
Before
apply_chat_template, the normalizer now inserts</think>before the first<tool_call>that follows a dangling<think>. That keeps tool XML outside the reasoning span for both bundled and user-supplied templates.Credit to Cheuk-Yiu Chan for identifying and documenting the Qwen 3.6 template failure mode and template-side repair:
Why
Qwen 3.6 can sustain interleaved thinking across turns. If malformed assistant history leaves
<think>open before<tool_call>, the next rendered prompt can condition the model as though the tool call is still reasoning. That can lead to ignored tool calls, reasoning leakage into tool turns, or polluted preserved history.This patch handles the structural transcript hygiene issue before template rendering, rather than relying on a specific template fork.
Changes
vllm_mlx.engine.chat_template_safety.normalize_messages_for_chat_template.tool_calls[].function.argumentsmapping normalization.apply_chat_template.<think>before raw<tool_call>without mutating caller messages.Validation
python -m pytest tests/test_batched_engine.py -q-> 11 passedAI_RUNTIME_BYPASS_SAFETY_GATE=1 /opt/ai-runtime/venv-live/bin/python -m pytest tests/test_batched_engine.py tests/test_chat_template_kwargs.py -q-> 20 passed