Inject default thinking_token_budget and presence_penalty for vLLM, fixing the gap where --override-generation-config doesn't propagate these fields. Prevents Qwen3 thinking-mode infinite loops.
-
Updated
Apr 26, 2026 - Shell
Inject default thinking_token_budget and presence_penalty for vLLM, fixing the gap where --override-generation-config doesn't propagate these fields. Prevents Qwen3 thinking-mode infinite loops.
vLLM v0.21 + Gemma 4 DFlash + real thinking_token_budget enforcement on Blackwell (sm_120 / sm_121a)
vLLM v0.21 + Qwen 3.6 DFlash + real thinking_token_budget enforcement on Blackwell (sm_120 / sm_121a)
Add a description, image, and links to the thinking-budget topic page so that developers can more easily learn about it.
To associate your repository with the thinking-budget topic, visit your repo's landing page and select "manage topics."