Skip to content

[BUG] Stream mode loses toolCall information and records cumulative textContent in tool calling loopsΒ #5167

@SONGY0

Description

@SONGY0

[BUG] Stream mode loses toolCall information and records cumulative textContent in tool calling loops

Suggested Labels

  • type: bug
  • severity: critical
  • area: observation
  • area: streaming
  • model: openai

πŸ› Bug Description

In the internalStream() method, when a tool calling loop occurs, the first Observation has two critical issues:

Issue 1: πŸ”΄ toolCall information is completely lost

The first Observation records toolCalls as null instead of the actual tool call information.

Root Cause: When the flatMap operator detects a toolCall, it directly returns the Flux from the second internalStream call without emitting the response containing the toolCall, causing MessageAggregator to miss the toolCall information.

Issue 2: 🟠 textContent records cumulative value

The first Observation records a cumulative textContent (containing text from both the first and second model calls) instead of only the output from the first call.

Root Cause: flatMap flattens the Flux returned by the second internalStream directly into the first stream, causing MessageAggregator to aggregate chunks from both calls.


πŸ”„ Steps to Reproduce

Prerequisites

  • Spring AI 1.1.0
  • Using streaming mode .stream()
  • Tool calling configured

Reproduction Steps

  1. Create a prompt that requires tool calling:

    chatClient.prompt()
        .user("Check today's weather in Beijing, then tell me what clothes to wear")
        .stream()
  2. Model returns tool call request (first call):

    {
      "textContent": "I need to call the get_weather tool.",
      "toolCalls": [
        {
          "id": "call_001",
          "name": "get_weather",
          "arguments": "{\"city\":\"Beijing\"}"
        }
      ]
    }
  3. Execute tool call: get_weather(city="Beijing") β†’ Returns "Beijing: Sunny, 5Β°C"

  4. Model returns final advice (second call):

    {
      "textContent": "Based on the current weather (Sunny, 5Β°C), I recommend wearing a down jacket or thick coat...",
      "toolCalls": null
    }
  5. Observe the recorded data in custom ObservationHandler


βœ… Expected Behavior

Two model calls should produce two independent Observations, each recording its own data:

// First Observation (logId=1)
{
  "textContent": "I need to call the get_weather tool.",
  "toolCalls": [
    {
      "id": "call_001",
      "name": "get_weather",
      "arguments": "{\"city\":\"Beijing\"}"
    }
  ]
}

// Second Observation (logId=2)
{
  "textContent": "Based on the current weather (Sunny, 5Β°C), I recommend wearing a down jacket or thick coat...",
  "toolCalls": null
}

❌ Actual Behavior

// First Observation (logId=1) - ❌ INCORRECT
{
  "textContent": "I need to call the get_weather tool.Based on the current weather (Sunny, 5Β°C), I recommend wearing a down jacket or thick coat...",
  // ↑ Cumulative content from both calls (A + B)

  "toolCalls": null  // ❌ Lost the toolCall information from the first call!
}

// Second Observation (logId=2) - βœ… CORRECT
{
  "textContent": "Based on the current weather (Sunny, 5Β°C), I recommend wearing a down jacket or thick coat...",
  "toolCalls": null
}

Critical Issues:

  • toolCall information completely lost
  • textContent incorrectly cumulative

πŸ” Root Cause Analysis

Source Location

org.springframework.ai.openai.OpenAiChatModel.java:366-401

Problem Code

public Flux<ChatResponse> internalStream(Prompt prompt, ChatResponse previousChatResponse) {
    return Flux.deferContextual(contextView -> {
        // Get streaming response from first model call
        Flux<OpenAiApi.ChatCompletionChunk> completionChunks =
            this.openAiApi.chatCompletionStream(request, ...);

        Flux<ChatResponse> chatResponse = completionChunks
            .map(this::chunkToChatCompletion)
            .switchMap(...);

        // ❌ PROBLEM: flatMap handles tool calling
        Flux<ChatResponse> flux = chatResponse.flatMap(response -> {
            if (this.toolExecutionEligibilityPredicate.isToolExecutionRequired(...)) {
                // Execute tool
                ToolExecutionResult toolExecutionResult =
                    this.toolCallingManager.executeToolCalls(prompt, response);

                if (!toolExecutionResult.returnDirect()) {
                    // ❌ Issue 1: Directly returns Flux from second internalStream
                    //            Does NOT emit the current response (containing toolCall)
                    return this.internalStream(
                        new Prompt(toolExecutionResult.conversationHistory(), ...),
                        response
                    );
                }
            } else {
                // βœ… Normal case: emit current response
                return Flux.just(response);
            }
        })
        .doOnFinally(s -> observation.stop());

        // ❌ Issue 2: MessageAggregator aggregates the flattened flux
        //            Contains partial chunks from first call + all chunks from second call
        return new MessageAggregator().aggregate(flux, observationContext::setResponse);
    });
}

Detailed Analysis

Issue 1: flatMap does not emit the response containing toolCall

// Actual behavior of flatMap:
chunk1 (no toolCall) β†’ Flux.just(chunk1) β†’ emit chunk1 βœ…
chunk2 (no toolCall) β†’ Flux.just(chunk2) β†’ emit chunk2 βœ…
chunk3 (no toolCall) β†’ Flux.just(chunk3) β†’ emit chunk3 βœ…
...
chunkN (has toolCall) β†’ this.internalStream(...)
                      β†’ returns Flux[chunkB1, chunkB2, ..., chunkBM]
                      β†’ ❌ chunkN itself is NOT emitted!

// Result after flatMap flattening:
Flux[chunk1, chunk2, chunk3, ..., chunk(N-1), chunkB1, chunkB2, ..., chunkBM]
//   ↑ Chunks from first call (excluding chunkN)  ↑ Chunks from second call

Result: MessageAggregator cannot access chunkN containing the toolCall, causing toolCall information loss.

Issue 2: MessageAggregator aggregates chunks from both calls

MessageAggregator's aggregation logic:

// doOnNext: Accumulate text from each chunk
textContent.append(chunk1.text);  // "I"
textContent.append(chunk2.text);  // " need"
textContent.append(chunk3.text);  // " to"
...
textContent.append(chunk(N-1).text);  // " get_weather tool."
// chunkN not passed, so toolCalls cannot be accumulated

textContent.append(chunkB1.text);  // "Based"
textContent.append(chunkB2.text);  // " on"
...
textContent.append(chunkBM.text);  // "thick coat..."

// doOnComplete: Final result
finalTextContent = "I need to call the get_weather tool.Based on the current weather...thick coat..." // A + B
finalToolCalls = null  // Because chunkN was not passed

πŸ’‘ Proposed Solutions

Solution 1: Minimal Change - Emit current response before recursion

Approach: Use Flux.concat() in flatMap to emit the response containing toolCall first, then recursively call the second internalStream.

Flux<ChatResponse> flux = chatResponse.flatMap(response -> {
    if (this.toolExecutionEligibilityPredicate.isToolExecutionRequired(...)) {
        ToolExecutionResult toolExecutionResult =
            this.toolCallingManager.executeToolCalls(prompt, response);

        if (!toolExecutionResult.returnDirect()) {
            // βœ… Emit current response (containing toolCall) first
            return Flux.concat(
                Flux.just(response),        // Ensure toolCall is passed to MessageAggregator
                this.internalStream(...)    // Then recursively call
            );
        }
    } else {
        return Flux.just(response);
    }
})
.doOnFinally(s -> observation.stop());

return new MessageAggregator().aggregate(flux, observationContext::setResponse);

Pros:

  • βœ… Minimal code change (only one line added)
  • βœ… Ensures toolCall is not lost (response is passed to MessageAggregator)

Cons:

  • ❌ Still has textContent cumulative issue (outer observation records A + B)
  • ❌ Behavior inconsistent with synchronous calls

Use Case: As a quick fix for the critical toolCall loss issue.


Solution 2: Complete Fix - Complete current Observation before recursion (Recommended ⭐)

Approach: Utilize MessageAggregator's callback mechanism to stop the current observation immediately after aggregation completes, ensuring each model call has an independent observation.

public Flux<ChatResponse> internalStream(Prompt prompt, ChatResponse previousChatResponse) {
    return Flux.deferContextual(contextView -> {
        Flux<ChatResponse> chatResponse = completionChunks
            .map(this::chunkToChatCompletion)
            .switchMap(...);

        // Create current observation
        final ChatModelObservationContext observationContext =
            ChatModelObservationContext.builder()
                .prompt(prompt)
                .provider(OpenAiApiConstants.PROVIDER_NAME)
                .build();

        Observation observation = ChatModelObservationDocumentation.CHAT_MODEL_OPERATION
            .observation(...)
            .start();

        // βœ… Key change: Complete aggregation and observation before handling toolCall
        return new MessageAggregator().aggregate(chatResponse, aggregatedResponse -> {
            // Callback after aggregation completes
            observationContext.setResponse(aggregatedResponse);
            observation.stop();  // βœ… Immediately complete current observation
        })
        .flatMap(aggregatedResponse -> {
            // Check if tool calling is needed
            if (this.toolExecutionEligibilityPredicate.isToolExecutionRequired(...)) {
                ToolExecutionResult toolExecutionResult =
                    this.toolCallingManager.executeToolCalls(prompt, aggregatedResponse);

                if (!toolExecutionResult.returnDirect()) {
                    // βœ… Current observation completed, start new recursive call
                    // Recursive call creates a new observation
                    return this.internalStream(
                        new Prompt(toolExecutionResult.conversationHistory(), ...),
                        aggregatedResponse
                    );
                } else {
                    return Flux.just(aggregatedResponse);
                }
            } else {
                return Flux.just(aggregatedResponse);
            }
        })
        .doOnError(observation::error);
    });
}

Pros:

  • βœ… Completely solves toolCall loss issue (each call recorded independently)
  • βœ… Completely solves textContent cumulative issue (each MessageAggregator independent)
  • βœ… Behavior fully consistent with synchronous internalCall()
  • βœ… Clear concept: one model call = one complete independent observation
  • βœ… Leaves room for future enhancements (can add parentObservationId, loopDepth, etc.)

Cons:

  • Requires code structure adjustment (move MessageAggregator before flatMap)

Behavior Comparison:

Synchronous call (internalCall):
  Model call 1 β†’ Observation 1 completes β†’ Tool execution β†’ Model call 2 β†’ Observation 2 completes βœ…

Streaming call (current implementation):
  Model call 1 starts β†’ Tool execution β†’ Model call 2 β†’ Observation 2 completes β†’ Observation 1 completes (contains 1+2) ❌

Streaming call (Solution 2 implementation):
  Model call 1 β†’ Observation 1 completes β†’ Tool execution β†’ Model call 2 β†’ Observation 2 completes βœ…

Use Case: As a long-term solution to ensure behavioral consistency between streaming and synchronous calls.


🌍 Environment Information

Version Info

  • Spring AI: 1.1.0
  • Spring Boot: 3.2.x
  • Java: 17+
  • Model Provider: OpenAI GPT-4 (confirmed)

Dependencies

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
    <version>1.1.0</version>
</dependency>

Configuration

spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        options:
          model: gpt-4
          temperature: 0.7

πŸ“Ž Related Code and Documentation

Problem Code Locations

  1. OpenAI Implementation: org.springframework.ai.openai.OpenAiChatModel.java:271-404
  2. MessageAggregator: org.springframework.ai.chat.model.MessageAggregator.java:56-189
  3. ChatModelObservationContext: org.springframework.ai.chat.observation.ChatModelObservationContext.java

Related Issues

  • This issue may be related to inconsistent behavior between synchronous and streaming calls
  • Recommend checking streaming implementations for other model providers (Anthropic, Azure OpenAI, etc.)

πŸ™ Expectations

We hope the Spring AI team can:

  1. Confirm the issue: Verify whether this behavior is as expected
  2. Provide guidance: If this is a design decision, please provide official best practice recommendations
  3. Fix the issue: If this is a bug, we hope it can be fixed in future releases
  4. Update documentation: Document the behavioral differences between streaming and synchronous calls in tool calling loops

πŸ“ Additional Notes

Why This Issue is Critical

  1. Inconsistency: Completely different behavior between synchronous (.call()) and streaming (.stream()) calls

    • Synchronous: Each model call is an independent Observation βœ…
    • Streaming: First Observation accumulates data from subsequent calls ❌
  2. Data Integrity: Loss of toolCall information means:

    • Cannot perform complete audit trails
    • Cannot debug tool calling failures
    • Cannot reproduce problem scenarios
  3. Production Impact: In enterprise applications:

    • Compliance requirements cannot be met
    • SLA monitoring data is unreliable
    • Troubleshooting costs significantly increase

Testing Recommendations

We recommend the Spring AI team add the following test cases:

  1. Streaming + Single Tool Call - Verify toolCall information is correctly recorded
  2. Streaming + Multiple Tool Calls - Verify behavior with multiple loops
  3. Streaming + Nested Agents - Verify behavior in complex scenarios
  4. Observation Data Consistency Tests - Compare observation data between synchronous and streaming modes

Thank you for the Spring AI team's hard work! Looking forward to seeing this issue resolved. πŸ™


Submitted by: Song Yang
Contact: [email protected]
Date: 2026-01-01

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions