-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Description
Deleted Queued Messages Create Duplicate Tool Results
Problem
When a user queues a message while a tool is executing, then deletes that queued message before tool completion, the deleted message text persists in internal task state and is incorrectly added as an extra tool_result block, causing a tool_use/tool_result count mismatch that violates Anthropic's API protocol constraints.
Context
This affects users who:
- Queue feedback messages while tools are executing (especially long-running terminal commands)
- Change their mind and delete the queued message before the tool completes
- Are using native tool protocol (default since v3.36.13, December 18, 2025)
Reproduction Steps
-
Environment: v3.38.3, Amazon Bedrock, native tool protocol enabled (default)
-
Actions:
- Start task with long-running
execute_commandtool - While executing, queue a message
- Delete the queued message before tool completes
- Wait for completion
- Start task with long-running
-
Result:
- System creates first tool_result with command output (correct)
- System adds second tool_result for same tool_use_id with deleted message text (incorrect)
- API error: "The number of toolResult blocks... exceeds the number of toolUse blocks"
Expected vs Actual Result
Expected: Deleted queued messages should not appear in tool results. Only one tool_result per tool_use.
Actual: Deleted message text persists in Task.askResponseText, creating a second tool_result for the same tool_use_id, causing API protocol violation and task failure.
Version & Provider
v3.38.3 (all versions since native tool protocol became default) | Amazon Bedrock | anthropic.claude-sonnet-4-5-20250929-v1:0
Relevant Logs or Errors
Full Error Object
{
"error": {
"timestamp": "2026-01-06T17:27:53.385Z",
"version": "3.38.3",
"provider": "bedrock",
"model": "anthropic.claude-sonnet-4-5-20250929-v1:0",
"details": "Unknown Error: The number of toolResult blocks at messages.27.content exceeds the number of toolUse blocks of previous turn."
}
}Complete Conversation Transcript
Click to expand full conversation history showing the bug
{
"error": {
"timestamp": "2026-01-06T17:27:53.385Z",
"version": "3.38.3",
"provider": "bedrock",
"model": "anthropic.claude-sonnet-4-5-20250929-v1:0",
"details": "Unknown Error: The number of toolResult blocks at messages.27.content exceeds the number of toolUse blocks of previous turn."
},
"history": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "<task>\n[REDACTED TASK DESCRIPTION]\n</task>"
},
{
"type": "text",
"text": "<environment_details>\n# VSCode Visible Files\n[REDACTED FILE PATH]\n\n# VSCode Open Tabs\n\n\n# Current Time\nCurrent time in ISO 8601 UTC format: 2026-01-06T17:13:40.943Z\nUser time zone: Europe/London, UTC+0:00\n\n# Current Cost\n$0.00\n\n# Current Mode\n<slug>code</slug>\n<name>π» Code</name>\n<model>global.anthropic.claude-sonnet-4-5-20250929-v1:0</model>\n<tool_format>native</tool_format>\n\n\n# Current Workspace Directory ([REDACTED PROJECT PATH]) Files\n[EXTENSIVE FILE LIST REDACTED]\n\nYou have not created a todo list yet. Create one with `update_todo_list` if your task is complicated or involves multiple steps.\n</environment_details>"
}
],
"ts": 1767719620983
},
{
"role": "assistant",
"content": [
{
"type": "reasoning",
"text": "[REDACTED REASONING]",
"summary": []
},
{
"type": "text",
"text": "I'll help you fix the final 2 test failures in the AI thread test suite. Let me start by running the tests to see the full error output."
},
{
"type": "tool_use",
"id": "tooluse_vB-VDL3YSG6T9Uy83BxrMA",
"name": "execute_command",
"input": {
"command": "npm run test:stream ai thread",
"cwd": "[REDACTED PROJECT PATH]"
}
}
],
"ts": 1767719629542
},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "tooluse_vB-VDL3YSG6T9Uy83BxrMA",
"content": "Command executed in terminal within working directory '[REDACTED]'. Command execution was not successful, inspect the cause and adjust as needed.\nExit code: 1\nOutput:\n\n[EXTENSIVE TEST OUTPUT REDACTED - Contains hundreds of test results]\n\nβ
2604 passed\nβ 2 failed\n\nRan in: 77 seconds\n\nError: Some of your tests failed."
},
{
"type": "text",
"text": "<environment_details>\n# VSCode Visible Files\n[REDACTED]\n\n# Current Time\nCurrent time in ISO 8601 UTC format: 2026-01-06T17:15:08.646Z\n\n# Current Cost\n$0.04\n\n# Current Mode\n<slug>code</slug>\n<name>π» Code</name>\n</environment_details>"
}
],
"ts": 1767719708649
},
{
"role": "assistant",
"content": [
{
"type": "reasoning",
"text": "[REDACTED REASONING]",
"summary": []
},
{
"type": "text",
"text": "I can see the tests ran with 2604 passing and 2 failed. However, the streamed output doesn't show which tests failed. Let me run the tests without streaming to get a clearer view of the failures."
},
{
"type": "tool_use",
"id": "tooluse_SWU8GLxHSkasPcteXMLJog",
"name": "execute_command",
"input": {
"command": "npm test ai thread",
"cwd": "[REDACTED]"
}
}
],
"ts": 1767719716786
},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "tooluse_SWU8GLxHSkasPcteXMLJog",
"content": "Command executed in terminal within working directory '[REDACTED]'. Command execution was not successful.\nExit code: 1\nOutput:\n\n[EXTENSIVE TEST OUTPUT REDACTED]\n\nβ
2604 passed\nβ 2 failed\n\nRan in: 68 seconds"
},
{
"type": "text",
"text": "<environment_details>\n# Current Time\nCurrent time in ISO 8601 UTC format: 2026-01-06T17:16:26.121Z\n# Current Cost\n$0.24\n</environment_details>"
}
],
"ts": 1767719786124
},
{
"role": "assistant",
"content": [
{
"type": "reasoning",
"text": "[REDACTED REASONING]",
"summary": []
},
{
"type": "text",
"text": "The test output isn't showing which specific tests failed. Let me check the test runner to see if there's a way to get more detailed failure output."
},
{
"type": "tool_use",
"id": "tooluse_iK6gTBosT5aDUC9L7PKk5g",
"name": "read_file",
"input": {
"files": [
{
"path": "test/parallel/main.ts"
}
]
}
}
],
"ts": 1767719793996
},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "tooluse_iK6gTBosT5aDUC9L7PKk5g",
"content": "File: test/parallel/main.ts\nLines 1-698:\n[COMPLETE FILE CONTENT REDACTED - TypeScript test runner configuration with concurrency management, database setup/teardown, and child process forking]"
},
{
"type": "text",
"text": "\n\nNew instructions for task continuation:\n<user_message>\nYes. `npm run test:stream ai thread`\n</user_message>"
},
{
"type": "text",
"text": "<environment_details>\n# Current Time\nCurrent time in ISO 8601 UTC format: 2026-01-06T17:16:41.099Z\n# Current Cost\n$0.27\n</environment_details>"
}
],
"ts": 1767719801152
},
{
"role": "assistant",
"content": [
{
"type": "reasoning",
"text": "[REDACTED REASONING]",
"summary": []
},
{
"type": "tool_use",
"id": "tooluse_DZgHbSAeS5aoW3BRKayFVQ",
"name": "execute_command",
"input": {
"command": "npm run test:stream ai thread 2>&1 | tee /tmp/test-output.txt",
"cwd": "[REDACTED]"
}
}
],
"ts": 1767719807830
},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "tooluse_DZgHbSAeS5aoW3BRKayFVQ",
"content": "Command executed in terminal within working directory '[REDACTED]'. Exit code: 0\nOutput:\n\n[EXTENSIVE TEST OUTPUT REDACTED - Multiple test suites with thousands of test cases]\n\nβ
2604 passed\nβ 2 failed\n\nRan in: 69 seconds"
},
{
"type": "text",
"text": "\n\nNew instructions for task continuation:\n<user_message>\nI managed to narrow it down myself.\n\nThere is one failure from `npm test assembleToolDefinitions` and another from `npm test assembleToolExecutions`.\n\nBut the failures themselves are a bit mysterious...\n</user_message>"
},
{
"type": "text",
"text": "<environment_details>\n# VSCode Visible Files\n[REDACTED]\n\n# Current Time\nCurrent time in ISO 8601 UTC format: 2026-01-06T17:22:34.940Z\n# Current Cost\n$0.37\n\n# Current Workspace Directory ([REDACTED]) Files\n[FILE LIST REDACTED]\n</environment_details>"
}
],
"ts": 1767720154993
},
{
"role": "assistant",
"content": [
{
"type": "reasoning",
"text": "[REDACTED REASONING]",
"summary": []
},
{
"type": "text",
"text": "Let me run these specific tests to see the failures:"
},
{
"type": "tool_use",
"id": "tooluse_oK42_8F4RH-BjKrnq4OSHA",
"name": "execute_command",
"input": {
"command": "npm test assembleToolDefinitions",
"cwd": "[REDACTED]"
}
}
],
"ts": 1767720163479
},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "tooluse_oK42_8F4RH-BjKrnq4OSHA",
"content": "Command executed in terminal. Command execution was not successful.\nExit code: 1\nOutput:\n\n[TEST HEADER REDACTED]\n\nβ
12 passed\nβ 1 failed"
},
{
"type": "text",
"text": "<environment_details>\n# Current Time\nCurrent time in ISO 8601 UTC format: 2026-01-06T17:22:47.192Z\n# Current Cost\n$0.58\n</environment_details>"
}
],
"ts": 1767720167193
},
{
"role": "assistant",
"content": [
{
"type": "reasoning",
"text": "[REDACTED REASONING]",
"summary": []
},
{
"type": "tool_use",
"id": "tooluse_fcrpWLhzTH2FtAdl-KLPHg",
"name": "execute_command",
"input": {
"command": "npm run test:stream assembleToolDefinitions",
"cwd": "[REDACTED]"
}
}
],
"ts": 1767720174139
},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "tooluse_fcrpWLhzTH2FtAdl-KLPHg",
"content": "Command executed in terminal. Exit code: 1\nOutput:\n\n[TEST BANNER REDACTED]\n\nβ
12 passing (96ms)\nβ 1 failing\n\n 1) [REDACTED TEST SUITE NAME]\n Test suite failed to load because...:\n TypeError: suite.default is not a function\n at Suite.<anonymous> (test/parallel/route.ts:45:17)\n at [REDACTED PATH]\n at processTicksAndRejections (node:internal/process/task_queues:105:5)"
},
{
"type": "text",
"text": "<environment_details>\n# Current Time\nCurrent time in ISO 8601 UTC format: 2026-01-06T17:22:57.465Z\n# Current Cost\n$0.63\n</environment_details>"
}
],
"ts": 1767720177466
},
{
"role": "assistant",
"content": [
{
"type": "reasoning",
"text": "[REDACTED REASONING]",
"summary": []
},
{
"type": "text",
"text": "I can see the issue! The error is `TypeError: suite.default is not a function` at `test/parallel/route.ts:45:17`. This suggests the test index file isn't exporting properly. Let me check the test files:"
},
{
"type": "tool_use",
"id": "tooluse_T9YVk0-sRZOwfgpuL57pLA",
"name": "read_file",
"input": {
"files": [
{
"path": "[REDACTED TEST FILE PATH 1]"
},
{
"path": "[REDACTED TEST FILE PATH 2]"
}
]
}
}
],
"ts": 1767720185141
},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "tooluse_T9YVk0-sRZOwfgpuL57pLA",
"content": "File: [REDACTED PATH 1]\nLines 1-12:\n[FILE CONTENT REDACTED - Test imports and describe block without default export]\n\nFile: [REDACTED PATH 2]\nLines 1-16:\n[FILE CONTENT REDACTED - Test imports and describe block without default export]"
},
{
"type": "text",
"text": "<environment_details>\n# Current Time\nCurrent time in ISO 8601 UTC format: 2026-01-06T17:23:05.263Z\n# Current Cost\n$0.69\n</environment_details>"
}
],
"ts": 1767720185265
},
{
"role": "assistant",
"content": [
{
"type": "reasoning",
"text": "[REDACTED REASONING]",
"summary": []
},
{
"type": "text",
"text": "I can see the issue! The test index files are missing the `export default` statement. The test files just have a `describe` block but no export. Let me check what format is expected:"
},
{
"type": "tool_use",
"id": "tooluse_JHhlAIwpQnqGSg9CScbYUg",
"name": "read_file",
"input": {
"files": [
{
"path": "test/parallel/route.ts"
}
]
}
}
],
"ts": 1767720192929
},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "tooluse_JHhlAIwpQnqGSg9CScbYUg",
"content": "File: test/parallel/route.ts\nLines 1-65:\n[FILE CONTENT REDACTED - Test runner that calls suite.default() at line 45]"
},
{
"type": "text",
"text": "<environment_details>\n# Current Time\nCurrent time in ISO 8601 UTC format: 2026-01-06T17:23:13.054Z\n# Current Cost\n$0.75\n</environment_details>"
}
],
"ts": 1767720193055
},
{
"role": "assistant",
"content": [
{
"type": "reasoning",
"text": "[REDACTED REASONING]",
"summary": []
},
{
"type": "text",
"text": "I see the issue! The test runner expects a `default` export function at line 45: `suite.default()`. Let me check a working test to see the correct pattern:"
},
{
"type": "tool_use",
"id": "tooluse_Hj7A2ol0R8uvAwKKF96NxQ",
"name": "read_file",
"input": {
"files": [
{
"path": "[REDACTED WORKING TEST PATH]"
}
]
}
}
],
"ts": 1767720200285
},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "tooluse_Hj7A2ol0R8uvAwKKF96NxQ",
"content": "File: [REDACTED PATH]\nLines 1-24:\n[FILE CONTENT REDACTED - Shows proper test structure with export default function wrapping describe block]"
},
{
"type": "text",
"text": "<environment_details>\n# Current Time\nCurrent time in ISO 8601 UTC format: 2026-01-06T17:23:20.609Z\n# Current Cost\n$0.81\n</environment_details>"
}
],
"ts": 1767720200610
},
{
"role": "assistant",
"content": [
{
"type": "reasoning",
"text": "[REDACTED REASONING]",
"summary": []
},
{
"type": "text",
"text": "Perfect! I found the issue. The test index files need to export a default function that wraps the `describe` block. Let me fix both files:"
},
{
"type": "tool_use",
"id": "tooluse_ZncIxJaVQu2HGo9iHbIRcQ",
"name": "apply_diff",
"input": {
"path": "[REDACTED TEST FILE PATH 1]",
"diff": "[DIFF CONTENT REDACTED - Wraps describe block in export default function]"
}
}
],
"ts": 1767720210349
},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "tooluse_ZncIxJaVQu2HGo9iHbIRcQ",
"content": "{\"path\":\"[REDACTED]\",\"operation\":\"modified\",\"notice\":\"You do not need to re-read the file, as you have seen all changes Proceed with the task using these changes as the new baseline.\"}\n<notice>Making multiple related changes in a single apply_diff is more efficient. If other changes are needed in this file, please include them as additional SEARCH/REPLACE blocks.</notice>"
},
{
"type": "text",
"text": "<environment_details>\n# Current Time\nCurrent time in ISO 8601 UTC format: 2026-01-06T17:23:31.965Z\n# Current Cost\n$0.88\n# Recently Modified Files\n[REDACTED]\n</environment_details>"
}
],
"ts": 1767720211966
},
{
"role": "assistant",
"content": [
{
"type": "reasoning",
"text": "[REDACTED REASONING]",
"summary": []
},
{
"type": "tool_use",
"id": "tooluse_TbE1v11GSHGqJeM8QecJQg",
"name": "apply_diff",
"input": {
"path": "[REDACTED TEST FILE PATH 2]",
"diff": "[DIFF CONTENT REDACTED - Wraps describe block in export default function]"
}
}
],
"ts": 1767720220516
},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "tooluse_TbE1v11GSHGqJeM8QecJQg",
"content": "{\"path\":\"[REDACTED]\",\"operation\":\"modified\",\"notice\":\"You do not need to re-read the file, as you have seen all changes Proceed with the task using these changes as the new baseline.\"}\n<notice>Making multiple related changes in a single apply_diff is more efficient. If other changes are needed in this file, please include them as additional SEARCH/REPLACE blocks.</notice>"
},
{
"type": "text",
"text": "<environment_details>\n# Current Time\nCurrent time in ISO 8601 UTC format: 2026-01-06T17:23:41.995Z\n# Current Cost\n$0.96\n# Recently Modified Files\n[REDACTED]\n</environment_details>"
}
],
"ts": 1767720221998
},
{
"role": "assistant",
"content": [
{
"type": "reasoning",
"text": "[REDACTED REASONING]",
"summary": []
},
{
"type": "text",
"text": "Now let me test both files to verify they're fixed:"
},
{
"type": "tool_use",
"id": "tooluse_Tw2pVVAUQt2BcMkhxZQeMA",
"name": "execute_command",
"input": {
"command": "npm test assembleToolDefinitions",
"cwd": "[REDACTED]"
}
}
],
"ts": 1767720228481
},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "tooluse_Tw2pVVAUQt2BcMkhxZQeMA",
"content": "Command executed in terminal within working directory '[REDACTED]'. Exit code: 0\nOutput:\n\n[TEST OUTPUT REDACTED]\n\nβ
12 passed\nβ 0 failed\n\nRan in: 2 seconds"
},
{
"type": "text",
"text": "<environment_details>\n# Current Time\nCurrent time in ISO 8601 UTC format: 2026-01-06T17:23:51.715Z\n# Current Cost\n$1.04\n</environment_details>"
}
],
"ts": 1767720231716
},
{
"role": "assistant",
"content": [
{
"type": "reasoning",
"text": "[REDACTED REASONING]",
"summary": []
},
{
"type": "text",
"text": "Excellent! Now let me test the second file:"
},
{
"type": "tool_use",
"id": "tooluse_PbLZjpT1QZSfWtIIWnMF4Q",
"name": "execute_command",
"input": {
"command": "npm test assembleToolExecutions",
"cwd": "[REDACTED]"
}
}
],
"ts": 1767720237664
},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "tooluse_PbLZjpT1QZSfWtIIWnMF4Q",
"content": "Command executed in terminal within working directory '[REDACTED]'. Exit code: 0\nOutput:\n\n[TEST OUTPUT REDACTED]\n\nβ
21 passed\nβ 0 failed\n\nRan in: 2 seconds"
},
{
"type": "text",
"text": "<environment_details>\n# Current Time\nCurrent time in ISO 8601 UTC format: 2026-01-06T17:24:00.871Z\n# Current Cost\n$1.09\n</environment_details>"
}
],
"ts": 1767720240873
},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "tooluse_PbLZjpT1QZSfWtIIWnMF4Q",
"content": "{\"status\":\"approved\",\"message\":\"The user approved this operation and provided the following context\",\"feedback\":\"Oh right. These are actually very wrong. Look at '[REDACTED' (see below for file content) FILE PATH]\"}"
},
{
"type": "text",
"text": "<environment_details>\n# Current Time\nCurrent time in ISO 8601 UTC format: 2026-01-06T17:25:50.210Z\n# Current Cost\n$1.14\n# Current Workspace Directory ([REDACTED]) Files\n[EXTENSIVE FILE LIST REDACTED]\n</environment_details>"
}
],
"ts": 1767720350244
}
]
}Key Observation
The critical evidence is in the final two user messages for tool_use_id tooluse_PbLZjpT1QZSfWtIIWnMF4Q:
- First tool_result (timestamp: 1767720240873): Valid command output "β 21 passed"
- Second tool_result (timestamp: 1767720350244): User feedback "Oh right. These are actually very wrong..."
The problem: Two tool_result blocks exist for the same tool_use_id, but there's only ONE tool_use block in the previous assistant message. This violates the API protocol requiring exactly one tool_result per tool_use.
The smoking gun: The timestamp gap (109 seconds) between the two tool_results indicates the user queued the feedback message while the command was running, then deleted it, but the system still added it as a second tool_result.
Root Cause Analysis
The Problem
A race condition between message queue dequeue and message deletion causes deleted queued message text to persist in task internal state (Task.askResponseText) and be incorrectly included as duplicate tool_result blocks. However, the validation function validateAndFixToolResultIds() fails to detect this because it only checks if a tool_result matches a tool_use in the previous assistant message, but doesn't check if that tool_use_id has already been paired with a tool_result in a previous user message in the conversation history.
Specific Mechanism
- Tool execution starts β Agent waits for completion/approval
- User queues message β Added to
MessageQueueService._messageswith unique ID - Early dequeue (the race) β While executing,
Task.ask()finds queued message, dequeues it, stores text inTask.askResponseText(message gone from queue but text persists in task state) - User deletes message β
removeQueuedMessageevent sent, but message already dequeued, so removal is no-op. Critical gap: No cleanup ofTask.askResponseText - Tool completes β
presentAssistantMessage.tsreadsTask.askResponseText(contains deleted message!), createstool_resultwith deleted text - First tool_result sent β Valid command output, API call succeeds
- Second tool_result created (orphan) β Deleted message text still in
Task.askResponseText, creates secondtool_resultfor sametool_use_id:Assistant[tool_use] β User1[tool_result] β User2[tool_result]β ORPHAN! - API rejection β "2 tool_result blocks > 1 tool_use block", task enters unrecoverable state
Code Paths
Message Queue Dequeue:
// src/core/task/Task.ts (around line 1238-1256)
if (!this.messageQueueService.isEmpty()) {
const message = this.messageQueueService.dequeueMessage()
if (message) {
// For tool approval, include queued text as feedback
if (type === "tool" || type === "command" || ...) {
this.handleWebviewAskResponse("yesButtonClicked", message.text, message.images)
// ^^^^^^^^^^^^ Stored here!
}
}
}State Persistence (The Symptom):
// src/core/task/Task.ts (around line 1291-1297)
handleWebviewAskResponse(askResponse: ClineAskResponse, text?: string, images?: string[]) {
this.cancelAutoApprovalTimeout()
this.askResponse = askResponse
this.askResponseText = text // β PERSISTS even after message deleted!
this.askResponseImages = images
// ... rest of method
}Message Deletion (No Cleanup):
// src/core/message-queue/MessageQueueService.ts (around line 54-64)
public removeMessage(id: string): boolean {
const { index, message } = this.findMessage(id)
if (!message) return false // β Already dequeued, returns false
this._messages.splice(index, 1)
this.emit("stateChanged", this._messages)
return true
}
// β Problem: Does NOT clear Task.askResponseText!Tool Result Creation (Where Duplicate Appears):
// src/core/assistant-message/presentAssistantMessage.ts (around line 607-612)
if (text) {
// β text = Task.askResponseText = deleted message!
await cline.say("user_feedback", text, images)
pushToolResult(formatResponse.toolResult(formatResponse.toolApprovedWithFeedback(text, toolProtocol), images)) // β Creates SECOND tool_result with deleted message
}The Core Issue
The root cause is a state cleanup gap: Once a queued message is dequeued and stored in Task.askResponseText, there is no mechanism to clear it when the message is subsequently deleted from the UI. Message deletion only affects the MessageQueueService._messages array, not the task's internal state variables.
However, the immediate solution should be defense-in-depth validation to filter duplicates before they reach the API, as this protects against this issue and many other potential tool result edge cases.
Recommended Fix: Cross-Message Orphan Filtering (Defense-in-Depth)
Approach
Enhance validateAndFixToolResultIds() to track which tool_use_ids have already been paired with tool_result blocks in previous user messages in the conversation history, then filter out any tool_result in the current message that references a tool_use_id that's already been paired.
Key concept: Each tool_use can be paired with exactly ONE tool_result. Once a pairing exists in a previous user message, any additional tool_result blocks referencing that same tool_use_id are orphans that must be filtered out. This prevents orphaned tool_results from being submitted when they reference tool_uses that are no longer available (already paired in earlier messages).
Why This Approach
- Single-File Change: Only modifies
src/core/task/validateToolResultIds.ts - Defense-in-Depth: Protects against this issue AND other potential cross-message tool result edge cases
- Low Complexity: Clear logic that scans conversation history to build paired ID set
- Broad Benefit: Handles multiple scenarios where tool_results get separated across messages
- Non-Breaking: No impact on existing functionality, just adds safety
Implementation
File Modified: src/core/task/validateToolResultIds.ts
Location: Insert after finding the previous assistant message (around line 140, after collecting validToolUseIds)
// After finding the previous assistant message and extracting tool_use IDs...
const validToolUseIds = new Set(
prevAssistantMessage.content
.filter((block): block is Anthropic.ToolUseBlockParam => block.type === "tool_use")
.map((toolUse) => toolUse.id),
)
// === NEW: Cross-message orphan filtering ===
// Scan backward through conversation history to find ALL previous user messages
// that came after the assistant message, and collect which tool_use_ids have
// already been paired with tool_results. This prevents orphaned tool_results from
// duplicate responses (e.g., queued message deletion, retry logic, etc.)
const prevAssistantIdx = apiConversationHistory.findIndex((msg) => msg === prevAssistantMessage)
// Get all messages that came after the assistant message (before current)
const messagesSinceAssistant = apiConversationHistory.slice(prevAssistantIdx + 1)
// Collect tool_use_ids that have already been paired with tool_results in previous user messages
const alreadyPairedToolUseIds = new Set<string>()
for (const msg of messagesSinceAssistant) {
if (msg.role === "user" && Array.isArray(msg.content)) {
const previousToolResults = msg.content.filter(
(block): block is Anthropic.ToolResultBlockParam => block.type === "tool_result",
)
previousToolResults.forEach((tr) => alreadyPairedToolUseIds.add(tr.tool_use_id))
}
}
// Now filter tool_results in current message to remove orphans
// A tool_result is an orphan if it references a tool_use that's already been paired
const filteredContent = userMessage.content.filter((block) => {
if (block.type !== "tool_result") {
return true // Keep all non-tool_result blocks
}
// Filter out tool_results that reference tool_uses which are already paired (no longer available)
if (alreadyPairedToolUseIds.has(block.tool_use_id)) {
return false // This is an orphan - tool_use already paired in previous message
}
return true // Keep this tool_result
})
// Update the user message with filtered content
userMessage = {
...userMessage,
content: filteredContent,
}
// Re-extract toolResults from filtered content for subsequent processing
const toolResults = filteredContent.filter(
(block): block is Anthropic.ToolResultBlockParam => block.type === "tool_result",
)
// === END NEW CODE ===
// Rest of existing validation logic continues unchanged...How This Fixes the Bug
Bug Scenario (Cross-Message):
Assistant Message 26: [tool_use(id="tooluse_PbLZjpT1QZSfWtIIWnMF4Q")]
User Message 27: [tool_result(id="tooluse_PbLZjpT1QZSfWtIIWnMF4Q", content="β
21 passed")] β Valid pairing
User Message 28: [tool_result(id="tooluse_PbLZjpT1QZSfWtIIWnMF4Q", content="deleted message")] β ORPHAN! (tool_use already paired)
With This Fix:
- Validation runs before submitting user message 28
- Find previous assistant message (index 26)
- Scan forward from index 27 to collect tool_use_ids that have already been paired
- Find message 27 contains tool_result for
tooluse_PbLZjpT1QZSfWtIIWnMF4Q - Add
tooluse_PbLZjpT1QZSfWtIIWnMF4QtoalreadyPairedToolUseIdsset - Filter message 28 content: tool_result for
tooluse_PbLZjpT1QZSfWtIIWnMF4Qalready in set β filter out - Result: Message 28 has no tool_results (or only other content if present) β
- API receives valid message structure: 1 tool_use β 1 tool_result (pairing complete)
- No error, task continues normally
Edge Cases Handled
-
Multiple Different Tool Results (normal case)
- Each has unique
tool_use_id - All kept (correct behavior)
- Each has unique
-
Legitimate Duplicate IDs (shouldn't exist, but if they do)
- First occurrence kept
- Subsequent duplicates filtered
- Prevents API error
-
Mixed Content Types
- Only
tool_resultblocks are checked - Text blocks, images, etc. always kept
- Only
-
No Tool Results
- Filter is no-op
- No performance impact
Testing Strategy
Add to [src/core/task/__tests__/validateToolResultIds.spec.ts]:
describe("validateAndFixToolResultIds - Cross-Message Orphan Filtering", () => {
it("should filter out tool_result in second user message that references already-paired tool_use", () => {
const messages: Anthropic.Messages.MessageParam[] = [
{
role: "assistant",
content: [
{
type: "tool_use",
id: "tooluse_PbLZjpT1QZSfWtIIWnMF4Q",
name: "execute_command",
input: { command: "npm test" },
},
],
},
{
role: "user",
content: [
{ type: "tool_result", tool_use_id: "tooluse_PbLZjpT1QZSfWtIIWnMF4Q", content: "β
21 passed" },
], // Valid pairing
},
{
role: "user",
content: [
{ type: "tool_result", tool_use_id: "tooluse_PbLZjpT1QZSfWtIIWnMF4Q", content: "deleted message" },
], // Orphan!
},
]
const result = validateAndFixToolResultIds(messages)
expect(result[1].content.filter((b: any) => b.type === "tool_result")).toHaveLength(1)
expect(result[2].content.filter((b: any) => b.type === "tool_result")).toHaveLength(0) // Orphan filtered
})
it("should keep tool_results in separate messages if they reference different tool_uses", () => {
// Test with tooluse_123 and tooluse_456 in separate messages - both should be kept
})
it("should preserve non-tool_result content when filtering orphaned tool_results", () => {
// Test that text blocks remain when orphaned tool_result is filtered
})
it("should handle multiple tool_uses with mixed valid and orphaned results", () => {
// Test filtering duplicate for one ID while keeping valid result for another ID
})
})Advantages: Surgical (~30 lines), clear intent, broad protection, no breaking changes, easy review, O(n) performance.
Impact: Before: API errors β task failure. After: Orphans filtered β task continues. Bonus: protects against other cross-message scenarios.
Additional Notes
Why Existing Validation Doesn't Catch This: Current validateAndFixToolResultIds() checks if tool_result's tool_use_id exists in previous assistant message, but doesn't check if that tool_use_id was already paired in a previous user message. The fix adds cross-message orphan tracking by scanning conversation history.
Similar Issue: Terminal Fallback (#10465) - external terminal fails β system retries β user approval arrives during fallback β duplicate tool_results. Same pattern: native protocol's strict validation exposed latent race conditions previously tolerated under XML protocol.
Historical Context
Type: Latent bug exposed by architectural change
Timeline: Message queue dequeue logic existed in codebase for extended period (2024-2025), became problematic on December 18, 2025 (v3.36.13) when native tool protocol became default.
Why Surface Now?
- XML protocol (pre-Dec 2025): No strict 1:1 tool_use/tool_result enforcement, duplicates tolerated, race condition invisible
- Native protocol (post-Dec 2025): Enforces exactly one
tool_resultpertool_use_id, API validates and rejects mismatches, race condition became fatal
Related Issues: Native tool rollout (Nov-Dec 2025) exposed multiple tool result edge cases including PR #9248, #9363, #9952, #10015, #10027, #10021, #10186 (the trigger), and #10465 (similar terminal fallback issue).
Key Insight: Native protocol's strict validation exposed latent race conditions in tool result generation that were previously invisible under XML protocol. Validation layer defense-in-depth is key protection.
Impact Assessment
- Severity: High - immediate task failure, no recovery, affects queued message workflows
- Frequency: Low-Medium - requires specific timing (queue during execution, delete before completion)
- Fix Complexity: Low - single-file change (~30 lines), no architectural refactoring, clear tests, easy review
Conclusion
Race condition in message queue state management causes deleted message text to persist in Task.askResponseText. Combined with validation gap (doesn't track already-paired tool_use_ids across messages), this creates duplicate tool_results.
Recommended fix: Cross-message orphan filtering enhancement to validateAndFixToolResultIds() - scans history, filters tool_results referencing already-paired tool_uses. Single-file change (~30 lines), immediate protection, handles cross-message tool result scenarios. State cleanup on message deletion should be addressed separately.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status