-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Expected Behavior
When an LLM returns multiple independent tool calls in a single response (via parallelToolCalls=true),
they should be executed concurrently to reduce overall latency.
Example with 3 independent tool calls (each taking ~2-3s):
Tool 1 ──2.5s──▶
Tool 2 ──3.0s────▶ Total: ~3.0s (max of all)
Tool 3 ──2.5s──▶
Current Behavior
DefaultToolCallingManager.executeToolCall() executes tool calls sequentially in a for loop:
for (AssistantMessage.ToolCall toolCall : assistantMessage.getToolCalls()) {
logger.debug("Executing tool call: {}", toolCall.name());
// ...
String toolCallResult = toolCallback.call(finalToolInputArguments, toolContext);
// ...
}This results in:
Tool 1 ──2.5s──▶ Tool 2 ──3.0s──▶ Tool 3 ──2.5s──▶ Total: ~8.0s
The parallelToolCalls option only controls whether the LLM can return multiple tool calls at once,
not how they are executed on the client side.
Context
How has this issue affected you?
I'm building an orchestration agent that routes requests to multiple sub-agents via tool calling.
When a user asks something like "Get info about user A, user B, and department C",
the LLM correctly returns 3 independent tool calls, but they execute sequentially,
resulting in 3x the latency compared to concurrent execution.
What are you trying to accomplish?
Reduce response latency for use cases involving multiple independent tool calls:
- Multi-agent orchestration (host agent calling multiple sub-agents)
- Data aggregation from multiple sources
- Batch lookups (multiple user/entity queries)
What other alternatives have you considered?
-
Custom ToolCallingManager: Implement a custom
ToolCallingManagerwith concurrent execution,
but this requires duplicating most of the existing logic. -
Batch tool design: Create a single tool that accepts multiple queries,
but this pushes complexity to the tool implementation and loses the natural LLM tool selection. -
Application-level parallelism: Disable internal tool execution and manage it manually,
but this defeats the purpose of the framework's tool calling abstraction.
Are you aware of any workarounds?
Setting internalToolExecutionEnabled=false and implementing concurrent execution manually using
ToolCallingManager API, but this is verbose and error-prone.
Proposed Solution
Add a configuration option for concurrent tool execution:
spring:
ai:
tool:
execution:
mode: concurrent # or 'sequential' (default)
pool-size: 10
timeout-ms: 30000Or at the chat options level:
OpenAiChatOptions.builder()
.parallelToolCalls(true) // LLM returns multiple calls (existing)
.toolExecutionMode(ToolExecutionMode.CONCURRENT) // Execute concurrently (new)
.build();Implementation Considerations
-
Response ordering: Results must be returned in the same order as the original
tool_calls
array to maintain consistency with LLM expectations and sequential execution behavior.List<CompletableFuture<IndexedToolResponse>> futures = IntStream.range(0, toolCalls.size()) .mapToObj(i -> CompletableFuture.supplyAsync(() -> new IndexedToolResponse(i, execute(toolCalls.get(i))), executor)) .toList(); // Sort by original index after completion results.sort(Comparator.comparingInt(IndexedToolResponse::index));
-
Error handling: Use "collect-all" strategy rather than fail-fast.
Execute all tools to completion and collect individual errors, matching current sequential behavior
where one tool failure doesn't prevent subsequent tools from executing. -
returnDirect handling: When multiple tools have different
returnDirectvalues,
the final value should be the AND of all results (preserving current behavior). -
Thread safety:
ToolContextis already immutable (Collections.unmodifiableMap),
so concurrent access is safe. -
Executor configuration: Allow customization of the thread pool used for concurrent execution
to prevent resource exhaustion in high-throughput scenarios. -
Backward compatibility: Default to sequential execution (
mode: sequential) to ensure
existing applications are not affected. -
Naming: Use
concurrentinstead ofparallelto differentiate from the existing
parallelToolCallsoption which controls LLM response behavior, not client execution.
Related Issues
- Support asynchronous/reactive function calling #1778 - FunctionCallback blocking limitation
- How spring-ai-starter-mcp-server-webflux honours non-blocking behaviour with Tool response being always synchronous? #4434 - Non-blocking tool execution question
- Advancing Tool Support in Spring AI #2049 - Advancing Tool Support in Spring AI
Labels
enhancementtool-calling