Support parallel execution of multiple tool calls in DefaultToolCallingManager

**Expected Behavior**

When an LLM returns multiple independent tool calls in a single response (via `parallelToolCalls=true`),
they should be executed concurrently to reduce overall latency.

Example with 3 independent tool calls (each taking ~2-3s):

```
Tool 1 ──2.5s──▶
Tool 2 ──3.0s────▶    Total: ~3.0s (max of all)
Tool 3 ──2.5s──▶
```

**Current Behavior**

`DefaultToolCallingManager.executeToolCall()` executes tool calls sequentially in a for loop:

```java
for (AssistantMessage.ToolCall toolCall : assistantMessage.getToolCalls()) {
    logger.debug("Executing tool call: {}", toolCall.name());
    // ...
    String toolCallResult = toolCallback.call(finalToolInputArguments, toolContext);
    // ...
}
```

This results in:
```
Tool 1 ──2.5s──▶ Tool 2 ──3.0s──▶ Tool 3 ──2.5s──▶  Total: ~8.0s
```

The `parallelToolCalls` option only controls whether the LLM can return multiple tool calls at once,
not how they are executed on the client side.

**Context**

**How has this issue affected you?**

I'm building an orchestration agent that routes requests to multiple sub-agents via tool calling.
When a user asks something like "Get info about user A, user B, and department C",
the LLM correctly returns 3 independent tool calls, but they execute sequentially,
resulting in 3x the latency compared to concurrent execution.

**What are you trying to accomplish?**

Reduce response latency for use cases involving multiple independent tool calls:
- Multi-agent orchestration (host agent calling multiple sub-agents)
- Data aggregation from multiple sources
- Batch lookups (multiple user/entity queries)

**What other alternatives have you considered?**

1. **Custom ToolCallingManager**: Implement a custom `ToolCallingManager` with concurrent execution,
   but this requires duplicating most of the existing logic.

2. **Batch tool design**: Create a single tool that accepts multiple queries,
   but this pushes complexity to the tool implementation and loses the natural LLM tool selection.

3. **Application-level parallelism**: Disable internal tool execution and manage it manually,
   but this defeats the purpose of the framework's tool calling abstraction.

**Are you aware of any workarounds?**

Setting `internalToolExecutionEnabled=false` and implementing concurrent execution manually using
`ToolCallingManager` API, but this is verbose and error-prone.

**Proposed Solution**

Add a configuration option for concurrent tool execution:

```yaml
spring:
  ai:
    tool:
      execution:
        mode: concurrent  # or 'sequential' (default)
        pool-size: 10
        timeout-ms: 30000
```

Or at the chat options level:

```java
OpenAiChatOptions.builder()
    .parallelToolCalls(true)                        // LLM returns multiple calls (existing)
    .toolExecutionMode(ToolExecutionMode.CONCURRENT) // Execute concurrently (new)
    .build();
```

**Implementation Considerations**

1. **Response ordering**: Results must be returned in the same order as the original `tool_calls`
   array to maintain consistency with LLM expectations and sequential execution behavior.

   ```java
   List<CompletableFuture<IndexedToolResponse>> futures =
       IntStream.range(0, toolCalls.size())
           .mapToObj(i -> CompletableFuture.supplyAsync(() ->
               new IndexedToolResponse(i, execute(toolCalls.get(i))), executor))
           .toList();

   // Sort by original index after completion
   results.sort(Comparator.comparingInt(IndexedToolResponse::index));
   ```

2. **Error handling**: Use "collect-all" strategy rather than fail-fast.
   Execute all tools to completion and collect individual errors, matching current sequential behavior
   where one tool failure doesn't prevent subsequent tools from executing.

3. **returnDirect handling**: When multiple tools have different `returnDirect` values,
   the final value should be the AND of all results (preserving current behavior).

4. **Thread safety**: `ToolContext` is already immutable (`Collections.unmodifiableMap`),
   so concurrent access is safe.

5. **Executor configuration**: Allow customization of the thread pool used for concurrent execution
   to prevent resource exhaustion in high-throughput scenarios.

6. **Backward compatibility**: Default to sequential execution (`mode: sequential`) to ensure
   existing applications are not affected.

7. **Naming**: Use `concurrent` instead of `parallel` to differentiate from the existing
   `parallelToolCalls` option which controls LLM response behavior, not client execution.

**Related Issues**

- #1778 - FunctionCallback blocking limitation
- #4434 - Non-blocking tool execution question
- #2049 - Advancing Tool Support in Spring AI

## Labels

- `enhancement`
- `tool-calling`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support parallel execution of multiple tool calls in DefaultToolCallingManager #5195

Labels

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support parallel execution of multiple tool calls in DefaultToolCallingManager #5195

Description

Labels

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions