Proposal: new chat_template_arg `enable_history_reasoning` for reusing prompt cache among querys within Agents .

### Description

Hi Qwen Team,

Not sure if this is the right place to post this proposal, but I don't know of anywhere else to post it and have a discussion.

I'm testing Qwen3.5 with some agents that involve many tool calls and multi-turn interactions. One thing catches my attention is that the prompt cache hit rate drops every time I send a new message.

I found this is because the chat template won't render the reasoning parts before the last user query message into the final prompt. Therefore, these messages cannot hit the prompt cache. The strategy could help in reducing context length and has little impact if there are few messages between two user queries.

But when it comes to the Deep Agent case, since tool calls are more often, the size of the reasoning content seems to be small (like tens of tokens?), making the overhead of re-processing these messages less acceptable.

So, I think it's better to offer a choice between reducing history context length and reducing prefill latency.

I've already submitted a PR to fix the chat template here: https://huggingface.co/Qwen/Qwen3.5-35B-A3B/discussions/60

I'd appreciate your feedback when you have a chance. Thanks!


### Reproduction

enable thinking in some agent.
send a new mesage.

### Logs

```shell

```

### Environment Information

-

### Known Issue

- [x] The issue hasn't been already addressed in Documentation, Issues, and Discussions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: new chat_template_arg `enable_history_reasoning` for reusing prompt cache among querys within Agents . #112

Description

Reproduction

Logs

Environment Information

Known Issue

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: new chat_template_arg enable_history_reasoning for reusing prompt cache among querys within Agents . #112

Description

Description

Reproduction

Logs

Environment Information

Known Issue

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Proposal: new chat_template_arg `enable_history_reasoning` for reusing prompt cache among querys within Agents . #112