Grammar, reasoning and jinja #12204

supersteves · 2025-03-05T18:04:22Z

supersteves
Mar 5, 2025

Today I played with the latest llama.cpp built from source, trying out the reasonably recent reasoning_format CLI arg (along with the required jinja arg), in my grammar based use case.

(Note, would be nice reasoning_format it could at some point be another secret/undocumented json property in the oai endpoint... perhaps when jinja is more mature and on by default).

It seems that enabling jinja disables gbnf grammar; the grammar property of the chat completions request json is ignored. Is this expected (for now)?

It also seems that deepseek is the default when jinja is used, so is extracted as reasoning_content.
However, when reasoning_format is specified explicitly as none, or jinja is not used, is preserved in the output.
It's a little odd to have two different defaults, but I guess this is an evolving area.

(And I wonder when OpenAI themselves will update their APIs to emit this or similar.)

I'm going to continue running without these args, but am considering updating my grammar to support . This way I can get the benefits of structured outputs at the same time as capturing the reasoning tokens. I wonder if there are any plans to do this, built-in. (I figure that, using a reasoning model with grammar that prevents it emitting reasoning tokens, will actually degrade the intelligence of the model, since it is no longer able to "auto-regressively" use its own thought process later on in generating the final output.)

This is a great project, by the way. Good work!

ggerganov · 2025-03-05T18:05:42Z

ggerganov
Mar 5, 2025
Maintainer

cc @ochafik

0 replies

ochafik · 2025-03-05T20:21:13Z

ochafik
Mar 5, 2025
Collaborator

It seems that enabling jinja disables gbnf grammar; the grammar property of the chat completions request json is ignored. Is this expected (for now)?

Hi @supersteves , thanks for testing the jinja mode out!

Jinja should work w/ grammar (cf. basic test), could you share which model, CLI args and exact query you've been using? (also llama-server --version please)

(Note, would be nice reasoning_format it could at some point be another secret/undocumented json property in the oai endpoint... perhaps when jinja is more mature and on by default).

Yes, ideally waiting for OpenAI to update their API to see where the wind blows. @ngxson is also keen to have a per-query param :-)

0 replies

supersteves · 2025-03-06T07:12:21Z

supersteves
Mar 6, 2025
Author

Hi @ochafik thanks for getting back. I can't share the exact query (includes sensitive aspects) and would need to put together a simpler test. Not something I can work on at this point. Next time I'm playing with local models I'll look again.

I can answer some of your questions though.

./build/bin/llama-server --port 8081 --reasoning-format deepseek --jinja -m models/deepseek-r1-distill-llama-8b-Q4_K_M.gguf

(I normally use the Qwen 32B distill but it's too slow on my M2 mac for rapid testing).

I built from source yesterday.

version: 4831 (5e43f104)
built with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.1.0

and

On branch master
Your branch is up to date with 'origin/master'.

and

git log -1
commit 5e43f104cca1a14874e980326a506b44fde022b8 (HEAD -> master, tag: b4831, origin/master, origin/HEAD)
Author: Akarshan Biswas <[email protected]>
Date:   Wed Mar 5 21:28:23 2025 +0530

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Grammar, reasoning and jinja #12204

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Grammar, reasoning and jinja #12204

supersteves Mar 5, 2025

Replies: 3 comments

ggerganov Mar 5, 2025 Maintainer

ochafik Mar 5, 2025 Collaborator

supersteves Mar 6, 2025 Author

supersteves
Mar 5, 2025

ggerganov
Mar 5, 2025
Maintainer

ochafik
Mar 5, 2025
Collaborator

supersteves
Mar 6, 2025
Author