# Qwen3.5-27B: Reasoning (`<think>`) reduced greatly  when tools are present in the request

### Description


## Description

When tools are included in the API request payload, Qwen3.5-27B reduces tihinking tokens --- the`<think>` block only contains a few sentences. The same prompt **without tools** produces 45-65 seconds of deep, multi-step reasoning. This is not a gradual degradation — it is a binary on/off behavior that makes it impossible to use tool-calling and reasoning together.

This severely limits agentic use cases where the model needs to **think deeply** about a problem before deciding whether to call a tool or answer directly.


## Analysis

The chat template's tool instruction block contains:

> "If you choose to call a function **ONLY reply in the following format with NO suffix**"
> "You may provide **optional** reasoning for your function call"

We patched these instructions to encourage reasoning ("Think carefully and reason thoroughly..."), but the behavior did not change. This confirms the suppression is **not** driven by the chat template instructions — it is a trained behavior embedded in the model weights. When the model sees `<tools>` in its tokenized input, it switches to a "fast tool-call mode" that bypasses extended reasoning entirely.

## Impact

This creates a fundamental conflict for agentic deployments:
- **With tools enabled**: The model can call functions but cannot reason deeply about complex questions
- **Without tools enabled**: The model reasons brilliantly but cannot call any functions

There is no middle ground. Users must choose between intelligence and capability.

## Expected behavior

The model should be able to reason thoroughly even when tools are available, especially when:
1. The question does not require any tool calls
2. The question requires deep logical reasoning before deciding whether to use a tool
3. `enable_thinking: true` is explicitly set

## Suggested improvement

- Allow configurable reasoning depth independently of tool presence
- Or: Only suppress extended reasoning **after** the model has decided to make a tool call, not preemptively when tools are merely available in the schema
- Or: Respect `thinking_budget` in `chat_template_kwargs` even when tools are present

## Comparison data

| Setup | Reasoning duration | Reasoning tokens | Correct answer |
|---|---|---|---|
| No tools, with system prompt | 47-66s | ~3000-5000 | ✅ Yes |
| With tools (even a single dummy tool) | 2-4s | 300-500 | ⚠️ Correct but shallow |


## Reproduction

### Test prompt
```
Die Autowaschanlage ist 2 Minuten weit weg. Soll ich zu Fuß oder mit dem Auto hin?
```
(Translation: "The car wash is 2 minutes away. Should I walk or drive?")

This is a logic puzzle — the correct answer requires reasoning that the **car** must physically be at the car wash to be washed, therefore you must drive.

### Without tools (full reasoning works)

```bash
curl -s http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer test" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Kbenkhaled/Qwen3.5-27B-NVFP4",
    "messages": [{"role": "user", "content": "Die Autowaschanlage ist 2 Minuten weit weg. Soll ich zu Fuß oder mit dem Auto hin?"}],
    "stream": false,
    "max_tokens": 8192
  }'
```

**Result**: `reasoning_content` contains ~2000-4000 tokens of deep multi-step reasoning (47-66 seconds). The model correctly identifies the constraint that the car must be at the wash.

### With tools (reasoning suppressed)

```bash
curl -s http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer Bearer test" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Kbenkhaled/Qwen3.5-27B-NVFP4",
    "messages": [{"role":"user","content":"Die Autowaschanlage ist 2 Minuten weit weg. Soll ich zu Fuß oder mit dem Auto hin?"}],
    "stream": false,
    "max_tokens": 8192,
    "tools": [{"type":"function","function":{"name":"web_search","description":"Search the web","parameters":{"type":"object","properties":{"query":{"type":"string"}}}}}]
  }' | python3 -c "
import sys,json;d=json.load(sys.stdin);m=d['choices'][0]['message']
# Check BOTH fields
rc = m.get('reasoning_content','') or ''
r = m.get('reasoning','') or ''
c = m.get('content','') or ''
print(f'reasoning_content: {len(rc)} chars')
print(f'reasoning: {len(r)} chars')
print(f'content: {len(c)} chars')
if r:
    print(f'reasoning[:500]: {r[:500]}')
if rc:
    print(f'reasoning_content[:500]: {rc[:500]}')
print(f'Usage: {d.get(\"usage\",{})}')
"
=== REASONING FIELD CHECK ===
reasoning_content: 0 chars
reasoning: 884 chars
content: 914 chars
reasoning[:500]: The user is asking whether they should walk or drive to a car wash that is 2 minutes away. This is a decision that depends on various factors like convenience, time, and practical considerations.
Let me think about this logically:
1. Distance: 2 minutes away - this is very close
2. If it's 2 minutes walking distance, that's roughly 150-200 meters
3. If it's 2 minutes driving distance, that's also very close
For such a short distance, walking is usually the better choice because:
- No need to
Usage: {'prompt_tokens': 291, 'total_tokens': 754, 'completion_tokens': 463, 'prompt_tokens_details': None}

```

**Result**: `reasoning_content` is a lot smaller. The model jumps directly to generating content with minimal reasoning. Total output ~800 tokens, duration ~3 seconds.


### Logs

```shell

```

### Environment Information

## Environment

- **Model**: `Kbenkhaled/Qwen3.5-27B-NVFP4` (also reproducible with other Qwen3.5 variants)
- **Inference**: vLLM v0.17.1 (`vllm/vllm-openai:v0.17.1-cu130`)
- **GPU**: NVIDIA RTX PRO 6000 Blackwell (96GB)
- **vLLM flags**: `--reasoning-parser qwen3 --enable-auto-tool-choice --tool-call-parser qwen3_coder`


### Known Issue

- [x] The issue hasn't been already addressed in Documentation, Issues, and Discussions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

# Qwen3.5-27B: Reasoning (`<think>`) reduced greatly when tools are present in the request #89

Description

Description

Analysis

Impact

Expected behavior

Suggested improvement

Comparison data

Reproduction

Test prompt

Without tools (full reasoning works)

With tools (reasoning suppressed)

Logs

Environment Information

Environment

Known Issue

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Setup	Reasoning duration	Reasoning tokens	Correct answer
No tools, with system prompt	47-66s	~3000-5000	✅ Yes
With tools (even a single dummy tool)	2-4s	300-500	⚠️ Correct but shallow

# Qwen3.5-27B: Reasoning (<think>) reduced greatly when tools are present in the request #89

Description

Description

Description

Analysis

Impact

Expected behavior

Suggested improvement

Comparison data

Reproduction

Test prompt

Without tools (full reasoning works)

With tools (reasoning suppressed)

Logs

Environment Information

Environment

Known Issue

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

# Qwen3.5-27B: Reasoning (`<think>`) reduced greatly when tools are present in the request #89