-
Notifications
You must be signed in to change notification settings - Fork 2k
Add smallFastModel configuration for lightweight tasks #2791
Description
What would you like to be added?
A configuration option to specify a "small/fast" model (e.g., smallFastModel or flashModel) that Qwen Code can automatically use for lightweight, low-stakes tasks, while keeping the main model for complex reasoning and code generation.
The idea is to use a smaller, faster model for structured, low-complexity tasks like:
- Generating commit messages or branch names
- Creating short titles/summaries for sessions
- Simple parsing or extraction tasks
- Quick verification or classification
- Tool call routing decisions
Why is this needed?
Speed and cost efficiency: Many internal operations don't require the full capability of a large model. Using a smaller, faster model for these tasks would:
- Reduce latency for simple operations (sub-second responses vs. waiting for a large model)
- Lower token costs for routine tasks
- Improve overall user experience by not bottlenecking simple operations on a heavy model
Clear pattern from industry: This is a well-established pattern — use a small, fast model for lightweight, structured, low-stakes tasks where speed and cost matter more than raw capability. Reserve the main model for complex reasoning, code generation, and multi-step problem solving.
Additional context
Current state: Qwen Code uses a single model for all tasks. There's no concept of automatic model routing based on task type. The closest existing mechanisms are:
- Subagent model selection (manual per-agent configuration)
- The
/modelcommand (manual user switching)
Proposed behavior:
- Add a
smallFastModelsetting insettings.json(optional — if not set, fall back to the main model) - Internally route lightweight tasks (titles, summaries, parsing, verification) to the small fast model
- Keep the main model for conversation, code generation, and complex reasoning
Example configuration:
{
"model": "qwen3.5-plus",
"smallFastModel": "qwen-turbo"
}Or via environment variable:
export QWEN_SMALL_FAST_MODEL=qwen-turbo