Structured Generation with Reasoning Parser in offline mode. #17638

psych0v0yager · 2025-05-04T17:25:19Z

psych0v0yager
May 4, 2025

According to the Qwen Docs
https://qwen.readthedocs.io/en/latest/deployment/vllm.html

And the vLLM docs
https://docs.vllm.ai/en/latest/features/reasoning_outputs.html

It is currently not possible to use the reasoning parser and structured generation in offline mode.

What is currently blocking this feature? I would like to use the latest Qwen 3 to generate some synthetic data. Ideally Qwen 3 would reason about the request, then output its response in structured json. Currently when I apply structured json in offline mode, it does not generate any thinking. Likewise there is currently no reasoning parser in vLLM's offline generation

It would be nice to do the following:
Question: What is the capital of Texas
Raw Response:

generated thinking

{"output": "Austin"}

TLDR apply freeform generation for the thinking phase, then structured generation for the final response. Can this be implemented with clever workarounds with the current version of vLLM or will it require some backend modification.

princepride · 2025-05-23T07:04:26Z

princepride
May 23, 2025

Qwen3 uses the tag to control whether to output its reasoning process. I guess the chat template that vLLM uses when running Qwen3 doesn't automatically add the tag. You could try using the --chat-template argument to add a chat template that includes the tag.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Structured Generation with Reasoning Parser in offline mode. #17638

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Structured Generation with Reasoning Parser in offline mode. #17638

Uh oh!

psych0v0yager May 4, 2025

Replies: 1 comment

Uh oh!

princepride May 23, 2025

psych0v0yager
May 4, 2025

princepride
May 23, 2025