Skip to content

Support parallel execution of multiple tool calls in DefaultToolCallingManager #5195

@kimehdgns

Description

@kimehdgns

Expected Behavior

When an LLM returns multiple independent tool calls in a single response (via parallelToolCalls=true),
they should be executed concurrently to reduce overall latency.

Example with 3 independent tool calls (each taking ~2-3s):

Tool 1 ──2.5s──▶
Tool 2 ──3.0s────▶    Total: ~3.0s (max of all)
Tool 3 ──2.5s──▶

Current Behavior

DefaultToolCallingManager.executeToolCall() executes tool calls sequentially in a for loop:

for (AssistantMessage.ToolCall toolCall : assistantMessage.getToolCalls()) {
    logger.debug("Executing tool call: {}", toolCall.name());
    // ...
    String toolCallResult = toolCallback.call(finalToolInputArguments, toolContext);
    // ...
}

This results in:

Tool 1 ──2.5s──▶ Tool 2 ──3.0s──▶ Tool 3 ──2.5s──▶  Total: ~8.0s

The parallelToolCalls option only controls whether the LLM can return multiple tool calls at once,
not how they are executed on the client side.

Context

How has this issue affected you?

I'm building an orchestration agent that routes requests to multiple sub-agents via tool calling.
When a user asks something like "Get info about user A, user B, and department C",
the LLM correctly returns 3 independent tool calls, but they execute sequentially,
resulting in 3x the latency compared to concurrent execution.

What are you trying to accomplish?

Reduce response latency for use cases involving multiple independent tool calls:

  • Multi-agent orchestration (host agent calling multiple sub-agents)
  • Data aggregation from multiple sources
  • Batch lookups (multiple user/entity queries)

What other alternatives have you considered?

  1. Custom ToolCallingManager: Implement a custom ToolCallingManager with concurrent execution,
    but this requires duplicating most of the existing logic.

  2. Batch tool design: Create a single tool that accepts multiple queries,
    but this pushes complexity to the tool implementation and loses the natural LLM tool selection.

  3. Application-level parallelism: Disable internal tool execution and manage it manually,
    but this defeats the purpose of the framework's tool calling abstraction.

Are you aware of any workarounds?

Setting internalToolExecutionEnabled=false and implementing concurrent execution manually using
ToolCallingManager API, but this is verbose and error-prone.

Proposed Solution

Add a configuration option for concurrent tool execution:

spring:
  ai:
    tool:
      execution:
        mode: concurrent  # or 'sequential' (default)
        pool-size: 10
        timeout-ms: 30000

Or at the chat options level:

OpenAiChatOptions.builder()
    .parallelToolCalls(true)                        // LLM returns multiple calls (existing)
    .toolExecutionMode(ToolExecutionMode.CONCURRENT) // Execute concurrently (new)
    .build();

Implementation Considerations

  1. Response ordering: Results must be returned in the same order as the original tool_calls
    array to maintain consistency with LLM expectations and sequential execution behavior.

    List<CompletableFuture<IndexedToolResponse>> futures =
        IntStream.range(0, toolCalls.size())
            .mapToObj(i -> CompletableFuture.supplyAsync(() ->
                new IndexedToolResponse(i, execute(toolCalls.get(i))), executor))
            .toList();
    
    // Sort by original index after completion
    results.sort(Comparator.comparingInt(IndexedToolResponse::index));
  2. Error handling: Use "collect-all" strategy rather than fail-fast.
    Execute all tools to completion and collect individual errors, matching current sequential behavior
    where one tool failure doesn't prevent subsequent tools from executing.

  3. returnDirect handling: When multiple tools have different returnDirect values,
    the final value should be the AND of all results (preserving current behavior).

  4. Thread safety: ToolContext is already immutable (Collections.unmodifiableMap),
    so concurrent access is safe.

  5. Executor configuration: Allow customization of the thread pool used for concurrent execution
    to prevent resource exhaustion in high-throughput scenarios.

  6. Backward compatibility: Default to sequential execution (mode: sequential) to ensure
    existing applications are not affected.

  7. Naming: Use concurrent instead of parallel to differentiate from the existing
    parallelToolCalls option which controls LLM response behavior, not client execution.

Related Issues

Labels

  • enhancement
  • tool-calling

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions