Problem
The chat templates of R1/QwQ auto append a tag in front of generation to force thinking but this breaks some UI, eg. ggml-org/llama.cpp#11861
Solution
Detect the thinking template and return "" in advance of generated content to client.
For reference:
ggml-org/llama.cpp#11607
Solve #294 #299 too.
Alternatives
Hmm.. None?
Acknowledgements