thinking-budget

Here are 3 public repositories matching this topic...

palmfuture / vllm-default-thinking-budget

Inject default thinking_token_budget and presence_penalty for vLLM, fixing the gap where --override-generation-config doesn't propagate these fields. Prevents Qwen3 thinking-mode infinite loops.

monkey-patch vllm llm-inference qwen reasoning-models qwen3 thinking-budget sampling-params

Updated Apr 26, 2026
Shell

davidzha712 / vllm-gemma4-dflash-budget

Star

vLLM v0.21 + Gemma 4 DFlash + real thinking_token_budget enforcement on Blackwell (sm_120 / sm_121a)

gemma reasoning blackwell vllm speculative-decoding nvfp4 dgx-spark cuda-graphs dflash gemma-4 thinking-budget

Updated May 26, 2026
Python

davidzha712 / vllm-qwen3-dflash-budget

Star

vLLM v0.21 + Qwen 3.6 DFlash + real thinking_token_budget enforcement on Blackwell (sm_120 / sm_121a)

reasoning blackwell fp8 vllm qwen speculative-decoding qwen3 dgx-spark cuda-graphs dflash thinking-budget

Updated May 26, 2026
Python

Improve this page

Add a description, image, and links to the thinking-budget topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the thinking-budget topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly