Grammar, reasoning and jinja #12204
Replies: 3 comments
-
cc @ochafik |
Beta Was this translation helpful? Give feedback.
-
Hi @supersteves , thanks for testing the jinja mode out! Jinja should work w/ grammar (cf. basic test), could you share which model, CLI args and exact query you've been using? (also
Yes, ideally waiting for OpenAI to update their API to see where the wind blows. @ngxson is also keen to have a per-query param :-) |
Beta Was this translation helpful? Give feedback.
-
Hi @ochafik thanks for getting back. I can't share the exact query (includes sensitive aspects) and would need to put together a simpler test. Not something I can work on at this point. Next time I'm playing with local models I'll look again. I can answer some of your questions though.
(I normally use the Qwen 32B distill but it's too slow on my M2 mac for rapid testing). I built from source yesterday.
and
and
|
Beta Was this translation helpful? Give feedback.
-
Today I played with the latest llama.cpp built from source, trying out the reasonably recent
reasoning_format
CLI arg (along with the requiredjinja
arg), in my grammar based use case.(Note, would be nice
reasoning_format
it could at some point be another secret/undocumented json property in the oai endpoint... perhaps when jinja is more mature and on by default).It seems that enabling jinja disables gbnf grammar; the
grammar
property of the chat completions request json is ignored. Is this expected (for now)?It also seems that deepseek is the default when jinja is used, so is extracted as reasoning_content.
However, when reasoning_format is specified explicitly as none, or jinja is not used, is preserved in the output.
It's a little odd to have two different defaults, but I guess this is an evolving area.
(And I wonder when OpenAI themselves will update their APIs to emit this or similar.)
I'm going to continue running without these args, but am considering updating my grammar to support . This way I can get the benefits of structured outputs at the same time as capturing the reasoning tokens. I wonder if there are any plans to do this, built-in. (I figure that, using a reasoning model with grammar that prevents it emitting reasoning tokens, will actually degrade the intelligence of the model, since it is no longer able to "auto-regressively" use its own thought process later on in generating the final output.)
This is a great project, by the way. Good work!
Beta Was this translation helpful? Give feedback.
All reactions