[BUG] Stream mode loses toolCall information and records cumulative textContent in tool calling loops

# [BUG] Stream mode loses toolCall information and records cumulative textContent in tool calling loops

  ## Suggested Labels

  - `type: bug`
  - `severity: critical`
  - `area: observation`
  - `area: streaming`
  - `model: openai`

## 🐛 Bug Description

In the `internalStream()` method, when a tool calling loop occurs, **the first Observation has two critical issues**:

### Issue 1: 🔴 toolCall information is completely lost

The first Observation records `toolCalls` as `null` instead of the actual tool call information.

**Root Cause**: When the `flatMap` operator detects a toolCall, it directly returns the Flux from the second `internalStream` call **without emitting the response containing the toolCall**, causing `MessageAggregator` to miss the toolCall information.

### Issue 2: 🟠 textContent records cumulative value

The first Observation records a **cumulative textContent** (containing text from both the first and second model calls) instead of only the output from the first call.

**Root Cause**: `flatMap` flattens the Flux returned by the second `internalStream` directly into the first stream, causing `MessageAggregator` to aggregate chunks from both calls.

---

## 🔄 Steps to Reproduce

### Prerequisites
- Spring AI 1.1.0
- Using streaming mode `.stream()`
- Tool calling configured

### Reproduction Steps

1. Create a prompt that requires tool calling:
   ```java
   chatClient.prompt()
       .user("Check today's weather in Beijing, then tell me what clothes to wear")
       .stream()
   ```

2. Model returns tool call request (first call):
   ```json
   {
     "textContent": "I need to call the get_weather tool.",
     "toolCalls": [
       {
         "id": "call_001",
         "name": "get_weather",
         "arguments": "{\"city\":\"Beijing\"}"
       }
     ]
   }
   ```

3. Execute tool call: `get_weather(city="Beijing")` → Returns "Beijing: Sunny, 5°C"

4. Model returns final advice (second call):
   ```json
   {
     "textContent": "Based on the current weather (Sunny, 5°C), I recommend wearing a down jacket or thick coat...",
     "toolCalls": null
   }
   ```

5. Observe the recorded data in custom `ObservationHandler`

---

## ✅ Expected Behavior

Two model calls should produce two **independent** Observations, each recording its own data:

```java
// First Observation (logId=1)
{
  "textContent": "I need to call the get_weather tool.",
  "toolCalls": [
    {
      "id": "call_001",
      "name": "get_weather",
      "arguments": "{\"city\":\"Beijing\"}"
    }
  ]
}

// Second Observation (logId=2)
{
  "textContent": "Based on the current weather (Sunny, 5°C), I recommend wearing a down jacket or thick coat...",
  "toolCalls": null
}
```

---

## ❌ Actual Behavior

```java
// First Observation (logId=1) - ❌ INCORRECT
{
  "textContent": "I need to call the get_weather tool.Based on the current weather (Sunny, 5°C), I recommend wearing a down jacket or thick coat...",
  // ↑ Cumulative content from both calls (A + B)

  "toolCalls": null  // ❌ Lost the toolCall information from the first call!
}

// Second Observation (logId=2) - ✅ CORRECT
{
  "textContent": "Based on the current weather (Sunny, 5°C), I recommend wearing a down jacket or thick coat...",
  "toolCalls": null
}
```

**Critical Issues**:
- toolCall information completely lost
- textContent incorrectly cumulative

---

## 🔍 Root Cause Analysis

### Source Location

`org.springframework.ai.openai.OpenAiChatModel.java:366-401`

### Problem Code

```java
public Flux<ChatResponse> internalStream(Prompt prompt, ChatResponse previousChatResponse) {
    return Flux.deferContextual(contextView -> {
        // Get streaming response from first model call
        Flux<OpenAiApi.ChatCompletionChunk> completionChunks =
            this.openAiApi.chatCompletionStream(request, ...);

        Flux<ChatResponse> chatResponse = completionChunks
            .map(this::chunkToChatCompletion)
            .switchMap(...);

        // ❌ PROBLEM: flatMap handles tool calling
        Flux<ChatResponse> flux = chatResponse.flatMap(response -> {
            if (this.toolExecutionEligibilityPredicate.isToolExecutionRequired(...)) {
                // Execute tool
                ToolExecutionResult toolExecutionResult =
                    this.toolCallingManager.executeToolCalls(prompt, response);

                if (!toolExecutionResult.returnDirect()) {
                    // ❌ Issue 1: Directly returns Flux from second internalStream
                    //            Does NOT emit the current response (containing toolCall)
                    return this.internalStream(
                        new Prompt(toolExecutionResult.conversationHistory(), ...),
                        response
                    );
                }
            } else {
                // ✅ Normal case: emit current response
                return Flux.just(response);
            }
        })
        .doOnFinally(s -> observation.stop());

        // ❌ Issue 2: MessageAggregator aggregates the flattened flux
        //            Contains partial chunks from first call + all chunks from second call
        return new MessageAggregator().aggregate(flux, observationContext::setResponse);
    });
}
```

### Detailed Analysis

#### Issue 1: flatMap does not emit the response containing toolCall

```java
// Actual behavior of flatMap:
chunk1 (no toolCall) → Flux.just(chunk1) → emit chunk1 ✅
chunk2 (no toolCall) → Flux.just(chunk2) → emit chunk2 ✅
chunk3 (no toolCall) → Flux.just(chunk3) → emit chunk3 ✅
...
chunkN (has toolCall) → this.internalStream(...)
                      → returns Flux[chunkB1, chunkB2, ..., chunkBM]
                      → ❌ chunkN itself is NOT emitted!

// Result after flatMap flattening:
Flux[chunk1, chunk2, chunk3, ..., chunk(N-1), chunkB1, chunkB2, ..., chunkBM]
//   ↑ Chunks from first call (excluding chunkN)  ↑ Chunks from second call
```

**Result**: `MessageAggregator` cannot access chunkN containing the toolCall, causing toolCall information loss.

#### Issue 2: MessageAggregator aggregates chunks from both calls

MessageAggregator's aggregation logic:

```java
// doOnNext: Accumulate text from each chunk
textContent.append(chunk1.text);  // "I"
textContent.append(chunk2.text);  // " need"
textContent.append(chunk3.text);  // " to"
...
textContent.append(chunk(N-1).text);  // " get_weather tool."
// chunkN not passed, so toolCalls cannot be accumulated

textContent.append(chunkB1.text);  // "Based"
textContent.append(chunkB2.text);  // " on"
...
textContent.append(chunkBM.text);  // "thick coat..."

// doOnComplete: Final result
finalTextContent = "I need to call the get_weather tool.Based on the current weather...thick coat..." // A + B
finalToolCalls = null  // Because chunkN was not passed
```

---

## 💡 Proposed Solutions

### Solution 1: Minimal Change - Emit current response before recursion

**Approach**: Use `Flux.concat()` in `flatMap` to emit the response containing toolCall first, then recursively call the second `internalStream`.

```java
Flux<ChatResponse> flux = chatResponse.flatMap(response -> {
    if (this.toolExecutionEligibilityPredicate.isToolExecutionRequired(...)) {
        ToolExecutionResult toolExecutionResult =
            this.toolCallingManager.executeToolCalls(prompt, response);

        if (!toolExecutionResult.returnDirect()) {
            // ✅ Emit current response (containing toolCall) first
            return Flux.concat(
                Flux.just(response),        // Ensure toolCall is passed to MessageAggregator
                this.internalStream(...)    // Then recursively call
            );
        }
    } else {
        return Flux.just(response);
    }
})
.doOnFinally(s -> observation.stop());

return new MessageAggregator().aggregate(flux, observationContext::setResponse);
```

**Pros**:
- ✅ Minimal code change (only one line added)
- ✅ Ensures toolCall is not lost (response is passed to MessageAggregator)

**Cons**:
- ❌ Still has textContent cumulative issue (outer observation records A + B)
- ❌ Behavior inconsistent with synchronous calls

**Use Case**: As a quick fix for the critical toolCall loss issue.

---

### Solution 2: Complete Fix - Complete current Observation before recursion (Recommended ⭐)

**Approach**: Utilize `MessageAggregator`'s callback mechanism to stop the current observation immediately after aggregation completes, ensuring each model call has an independent observation.

```java
public Flux<ChatResponse> internalStream(Prompt prompt, ChatResponse previousChatResponse) {
    return Flux.deferContextual(contextView -> {
        Flux<ChatResponse> chatResponse = completionChunks
            .map(this::chunkToChatCompletion)
            .switchMap(...);

        // Create current observation
        final ChatModelObservationContext observationContext =
            ChatModelObservationContext.builder()
                .prompt(prompt)
                .provider(OpenAiApiConstants.PROVIDER_NAME)
                .build();

        Observation observation = ChatModelObservationDocumentation.CHAT_MODEL_OPERATION
            .observation(...)
            .start();

        // ✅ Key change: Complete aggregation and observation before handling toolCall
        return new MessageAggregator().aggregate(chatResponse, aggregatedResponse -> {
            // Callback after aggregation completes
            observationContext.setResponse(aggregatedResponse);
            observation.stop();  // ✅ Immediately complete current observation
        })
        .flatMap(aggregatedResponse -> {
            // Check if tool calling is needed
            if (this.toolExecutionEligibilityPredicate.isToolExecutionRequired(...)) {
                ToolExecutionResult toolExecutionResult =
                    this.toolCallingManager.executeToolCalls(prompt, aggregatedResponse);

                if (!toolExecutionResult.returnDirect()) {
                    // ✅ Current observation completed, start new recursive call
                    // Recursive call creates a new observation
                    return this.internalStream(
                        new Prompt(toolExecutionResult.conversationHistory(), ...),
                        aggregatedResponse
                    );
                } else {
                    return Flux.just(aggregatedResponse);
                }
            } else {
                return Flux.just(aggregatedResponse);
            }
        })
        .doOnError(observation::error);
    });
}
```

**Pros**:
- ✅ Completely solves toolCall loss issue (each call recorded independently)
- ✅ Completely solves textContent cumulative issue (each MessageAggregator independent)
- ✅ Behavior fully consistent with synchronous `internalCall()`
- ✅ Clear concept: one model call = one complete independent observation
- ✅ Leaves room for future enhancements (can add parentObservationId, loopDepth, etc.)

**Cons**:
- Requires code structure adjustment (move MessageAggregator before flatMap)

**Behavior Comparison**:
```
Synchronous call (internalCall):
  Model call 1 → Observation 1 completes → Tool execution → Model call 2 → Observation 2 completes ✅

Streaming call (current implementation):
  Model call 1 starts → Tool execution → Model call 2 → Observation 2 completes → Observation 1 completes (contains 1+2) ❌

Streaming call (Solution 2 implementation):
  Model call 1 → Observation 1 completes → Tool execution → Model call 2 → Observation 2 completes ✅
```

**Use Case**: As a long-term solution to ensure behavioral consistency between streaming and synchronous calls.

---

## 🌍 Environment Information

### Version Info
- **Spring AI**: 1.1.0
- **Spring Boot**: 3.2.x
- **Java**: 17+
- **Model Provider**: OpenAI GPT-4 (confirmed)

### Dependencies

```xml
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
    <version>1.1.0</version>
</dependency>
```

### Configuration

```yaml
spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        options:
          model: gpt-4
          temperature: 0.7
```

---

## 📎 Related Code and Documentation

### Problem Code Locations

1. **OpenAI Implementation**: `org.springframework.ai.openai.OpenAiChatModel.java:271-404`
2. **MessageAggregator**: `org.springframework.ai.chat.model.MessageAggregator.java:56-189`
3. **ChatModelObservationContext**: `org.springframework.ai.chat.observation.ChatModelObservationContext.java`

### Related Issues

- This issue may be related to inconsistent behavior between synchronous and streaming calls
- Recommend checking streaming implementations for other model providers (Anthropic, Azure OpenAI, etc.)

---

## 🙏 Expectations

We hope the Spring AI team can:

1. **Confirm the issue**: Verify whether this behavior is as expected
2. **Provide guidance**: If this is a design decision, please provide official best practice recommendations
3. **Fix the issue**: If this is a bug, we hope it can be fixed in future releases
4. **Update documentation**: Document the behavioral differences between streaming and synchronous calls in tool calling loops

---

## 📝 Additional Notes

### Why This Issue is Critical

1. **Inconsistency**: Completely different behavior between synchronous (`.call()`) and streaming (`.stream()`) calls
   - Synchronous: Each model call is an independent Observation ✅
   - Streaming: First Observation accumulates data from subsequent calls ❌

2. **Data Integrity**: Loss of toolCall information means:
   - Cannot perform complete audit trails
   - Cannot debug tool calling failures
   - Cannot reproduce problem scenarios

3. **Production Impact**: In enterprise applications:
   - Compliance requirements cannot be met
   - SLA monitoring data is unreliable
   - Troubleshooting costs significantly increase

### Testing Recommendations

We recommend the Spring AI team add the following test cases:

1. **Streaming + Single Tool Call** - Verify toolCall information is correctly recorded
2. **Streaming + Multiple Tool Calls** - Verify behavior with multiple loops
3. **Streaming + Nested Agents** - Verify behavior in complex scenarios
4. **Observation Data Consistency Tests** - Compare observation data between synchronous and streaming modes

---

**Thank you for the Spring AI team's hard work! Looking forward to seeing this issue resolved.** 🙏

---

**Submitted by**: Song Yang
**Contact**: songzining001@gmail.com
**Date**: 2026-01-01


[BUG] Stream mode loses toolCall information and records cumulative textContent in tool calling loops #5167

Description

[BUG] Stream mode loses toolCall information and records cumulative textContent in tool calling loops

Suggested Labels

🐛 Bug Description

Issue 1: 🔴 toolCall information is completely lost

Issue 2: 🟠 textContent records cumulative value

🔄 Steps to Reproduce

Prerequisites

Reproduction Steps

✅ Expected Behavior

❌ Actual Behavior

🔍 Root Cause Analysis

Source Location

Problem Code

Detailed Analysis

Issue 1: flatMap does not emit the response containing toolCall

Issue 2: MessageAggregator aggregates chunks from both calls

💡 Proposed Solutions

Solution 1: Minimal Change - Emit current response before recursion

Solution 2: Complete Fix - Complete current Observation before recursion (Recommended ⭐)

🌍 Environment Information

Version Info

Dependencies

Configuration

📎 Related Code and Documentation

Problem Code Locations

Related Issues

🙏 Expectations

📝 Additional Notes

Why This Issue is Critical

Testing Recommendations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions