Structured Generation with Reasoning Parser in offline mode. #17638
psych0v0yager
announced in
General
Replies: 1 comment
-
Qwen3 uses the tag to control whether to output its reasoning process. I guess the chat template that vLLM uses when running Qwen3 doesn't automatically add the tag. You could try using the |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
According to the Qwen Docs
https://qwen.readthedocs.io/en/latest/deployment/vllm.html
And the vLLM docs
https://docs.vllm.ai/en/latest/features/reasoning_outputs.html
It is currently not possible to use the reasoning parser and structured generation in offline mode.
What is currently blocking this feature? I would like to use the latest Qwen 3 to generate some synthetic data. Ideally Qwen 3 would reason about the request, then output its response in structured json. Currently when I apply structured json in offline mode, it does not generate any thinking. Likewise there is currently no reasoning parser in vLLM's offline generation
It would be nice to do the following:
Question: What is the capital of Texas
Raw Response:
generated thinking
{"output": "Austin"}
TLDR apply freeform generation for the thinking phase, then structured generation for the final response. Can this be implemented with clever workarounds with the current version of vLLM or will it require some backend modification.
Beta Was this translation helpful? Give feedback.
All reactions