Skip to content

[BUG] Deleted Queued Messages Create Duplicate Tool ResultsΒ #10494

@nabilfreeman

Description

@nabilfreeman

Deleted Queued Messages Create Duplicate Tool Results

Problem

When a user queues a message while a tool is executing, then deletes that queued message before tool completion, the deleted message text persists in internal task state and is incorrectly added as an extra tool_result block, causing a tool_use/tool_result count mismatch that violates Anthropic's API protocol constraints.

Context

This affects users who:

  1. Queue feedback messages while tools are executing (especially long-running terminal commands)
  2. Change their mind and delete the queued message before the tool completes
  3. Are using native tool protocol (default since v3.36.13, December 18, 2025)

Reproduction Steps

  1. Environment: v3.38.3, Amazon Bedrock, native tool protocol enabled (default)

  2. Actions:

    • Start task with long-running execute_command tool
    • While executing, queue a message
    • Delete the queued message before tool completes
    • Wait for completion
  3. Result:

    • System creates first tool_result with command output (correct)
    • System adds second tool_result for same tool_use_id with deleted message text (incorrect)
    • API error: "The number of toolResult blocks... exceeds the number of toolUse blocks"

Expected vs Actual Result

Expected: Deleted queued messages should not appear in tool results. Only one tool_result per tool_use.

Actual: Deleted message text persists in Task.askResponseText, creating a second tool_result for the same tool_use_id, causing API protocol violation and task failure.

Version & Provider

v3.38.3 (all versions since native tool protocol became default) | Amazon Bedrock | anthropic.claude-sonnet-4-5-20250929-v1:0

Relevant Logs or Errors

Full Error Object

{
	"error": {
		"timestamp": "2026-01-06T17:27:53.385Z",
		"version": "3.38.3",
		"provider": "bedrock",
		"model": "anthropic.claude-sonnet-4-5-20250929-v1:0",
		"details": "Unknown Error: The number of toolResult blocks at messages.27.content exceeds the number of toolUse blocks of previous turn."
	}
}

Complete Conversation Transcript

Click to expand full conversation history showing the bug
{
	"error": {
		"timestamp": "2026-01-06T17:27:53.385Z",
		"version": "3.38.3",
		"provider": "bedrock",
		"model": "anthropic.claude-sonnet-4-5-20250929-v1:0",
		"details": "Unknown Error: The number of toolResult blocks at messages.27.content exceeds the number of toolUse blocks of previous turn."
	},
	"history": [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "<task>\n[REDACTED TASK DESCRIPTION]\n</task>"
				},
				{
					"type": "text",
					"text": "<environment_details>\n# VSCode Visible Files\n[REDACTED FILE PATH]\n\n# VSCode Open Tabs\n\n\n# Current Time\nCurrent time in ISO 8601 UTC format: 2026-01-06T17:13:40.943Z\nUser time zone: Europe/London, UTC+0:00\n\n# Current Cost\n$0.00\n\n# Current Mode\n<slug>code</slug>\n<name>πŸ’» Code</name>\n<model>global.anthropic.claude-sonnet-4-5-20250929-v1:0</model>\n<tool_format>native</tool_format>\n\n\n# Current Workspace Directory ([REDACTED PROJECT PATH]) Files\n[EXTENSIVE FILE LIST REDACTED]\n\nYou have not created a todo list yet. Create one with `update_todo_list` if your task is complicated or involves multiple steps.\n</environment_details>"
				}
			],
			"ts": 1767719620983
		},
		{
			"role": "assistant",
			"content": [
				{
					"type": "reasoning",
					"text": "[REDACTED REASONING]",
					"summary": []
				},
				{
					"type": "text",
					"text": "I'll help you fix the final 2 test failures in the AI thread test suite. Let me start by running the tests to see the full error output."
				},
				{
					"type": "tool_use",
					"id": "tooluse_vB-VDL3YSG6T9Uy83BxrMA",
					"name": "execute_command",
					"input": {
						"command": "npm run test:stream ai thread",
						"cwd": "[REDACTED PROJECT PATH]"
					}
				}
			],
			"ts": 1767719629542
		},
		{
			"role": "user",
			"content": [
				{
					"type": "tool_result",
					"tool_use_id": "tooluse_vB-VDL3YSG6T9Uy83BxrMA",
					"content": "Command executed in terminal within working directory '[REDACTED]'. Command execution was not successful, inspect the cause and adjust as needed.\nExit code: 1\nOutput:\n\n[EXTENSIVE TEST OUTPUT REDACTED - Contains hundreds of test results]\n\nβœ… 2604 passed\n❌ 2 failed\n\nRan in: 77 seconds\n\nError: Some of your tests failed."
				},
				{
					"type": "text",
					"text": "<environment_details>\n# VSCode Visible Files\n[REDACTED]\n\n# Current Time\nCurrent time in ISO 8601 UTC format: 2026-01-06T17:15:08.646Z\n\n# Current Cost\n$0.04\n\n# Current Mode\n<slug>code</slug>\n<name>πŸ’» Code</name>\n</environment_details>"
				}
			],
			"ts": 1767719708649
		},
		{
			"role": "assistant",
			"content": [
				{
					"type": "reasoning",
					"text": "[REDACTED REASONING]",
					"summary": []
				},
				{
					"type": "text",
					"text": "I can see the tests ran with 2604 passing and 2 failed. However, the streamed output doesn't show which tests failed. Let me run the tests without streaming to get a clearer view of the failures."
				},
				{
					"type": "tool_use",
					"id": "tooluse_SWU8GLxHSkasPcteXMLJog",
					"name": "execute_command",
					"input": {
						"command": "npm test ai thread",
						"cwd": "[REDACTED]"
					}
				}
			],
			"ts": 1767719716786
		},
		{
			"role": "user",
			"content": [
				{
					"type": "tool_result",
					"tool_use_id": "tooluse_SWU8GLxHSkasPcteXMLJog",
					"content": "Command executed in terminal within working directory '[REDACTED]'. Command execution was not successful.\nExit code: 1\nOutput:\n\n[EXTENSIVE TEST OUTPUT REDACTED]\n\nβœ… 2604 passed\n❌ 2 failed\n\nRan in: 68 seconds"
				},
				{
					"type": "text",
					"text": "<environment_details>\n# Current Time\nCurrent time in ISO 8601 UTC format: 2026-01-06T17:16:26.121Z\n# Current Cost\n$0.24\n</environment_details>"
				}
			],
			"ts": 1767719786124
		},
		{
			"role": "assistant",
			"content": [
				{
					"type": "reasoning",
					"text": "[REDACTED REASONING]",
					"summary": []
				},
				{
					"type": "text",
					"text": "The test output isn't showing which specific tests failed. Let me check the test runner to see if there's a way to get more detailed failure output."
				},
				{
					"type": "tool_use",
					"id": "tooluse_iK6gTBosT5aDUC9L7PKk5g",
					"name": "read_file",
					"input": {
						"files": [
							{
								"path": "test/parallel/main.ts"
							}
						]
					}
				}
			],
			"ts": 1767719793996
		},
		{
			"role": "user",
			"content": [
				{
					"type": "tool_result",
					"tool_use_id": "tooluse_iK6gTBosT5aDUC9L7PKk5g",
					"content": "File: test/parallel/main.ts\nLines 1-698:\n[COMPLETE FILE CONTENT REDACTED - TypeScript test runner configuration with concurrency management, database setup/teardown, and child process forking]"
				},
				{
					"type": "text",
					"text": "\n\nNew instructions for task continuation:\n<user_message>\nYes. `npm run test:stream ai thread`\n</user_message>"
				},
				{
					"type": "text",
					"text": "<environment_details>\n# Current Time\nCurrent time in ISO 8601 UTC format: 2026-01-06T17:16:41.099Z\n# Current Cost\n$0.27\n</environment_details>"
				}
			],
			"ts": 1767719801152
		},
		{
			"role": "assistant",
			"content": [
				{
					"type": "reasoning",
					"text": "[REDACTED REASONING]",
					"summary": []
				},
				{
					"type": "tool_use",
					"id": "tooluse_DZgHbSAeS5aoW3BRKayFVQ",
					"name": "execute_command",
					"input": {
						"command": "npm run test:stream ai thread 2>&1 | tee /tmp/test-output.txt",
						"cwd": "[REDACTED]"
					}
				}
			],
			"ts": 1767719807830
		},
		{
			"role": "user",
			"content": [
				{
					"type": "tool_result",
					"tool_use_id": "tooluse_DZgHbSAeS5aoW3BRKayFVQ",
					"content": "Command executed in terminal within working directory '[REDACTED]'. Exit code: 0\nOutput:\n\n[EXTENSIVE TEST OUTPUT REDACTED - Multiple test suites with thousands of test cases]\n\nβœ… 2604 passed\n❌ 2 failed\n\nRan in: 69 seconds"
				},
				{
					"type": "text",
					"text": "\n\nNew instructions for task continuation:\n<user_message>\nI managed to narrow it down myself.\n\nThere is one failure from `npm test assembleToolDefinitions` and another from `npm test assembleToolExecutions`.\n\nBut the failures themselves are a bit mysterious...\n</user_message>"
				},
				{
					"type": "text",
					"text": "<environment_details>\n# VSCode Visible Files\n[REDACTED]\n\n# Current Time\nCurrent time in ISO 8601 UTC format: 2026-01-06T17:22:34.940Z\n# Current Cost\n$0.37\n\n# Current Workspace Directory ([REDACTED]) Files\n[FILE LIST REDACTED]\n</environment_details>"
				}
			],
			"ts": 1767720154993
		},
		{
			"role": "assistant",
			"content": [
				{
					"type": "reasoning",
					"text": "[REDACTED REASONING]",
					"summary": []
				},
				{
					"type": "text",
					"text": "Let me run these specific tests to see the failures:"
				},
				{
					"type": "tool_use",
					"id": "tooluse_oK42_8F4RH-BjKrnq4OSHA",
					"name": "execute_command",
					"input": {
						"command": "npm test assembleToolDefinitions",
						"cwd": "[REDACTED]"
					}
				}
			],
			"ts": 1767720163479
		},
		{
			"role": "user",
			"content": [
				{
					"type": "tool_result",
					"tool_use_id": "tooluse_oK42_8F4RH-BjKrnq4OSHA",
					"content": "Command executed in terminal. Command execution was not successful.\nExit code: 1\nOutput:\n\n[TEST HEADER REDACTED]\n\nβœ… 12 passed\n❌ 1 failed"
				},
				{
					"type": "text",
					"text": "<environment_details>\n# Current Time\nCurrent time in ISO 8601 UTC format: 2026-01-06T17:22:47.192Z\n# Current Cost\n$0.58\n</environment_details>"
				}
			],
			"ts": 1767720167193
		},
		{
			"role": "assistant",
			"content": [
				{
					"type": "reasoning",
					"text": "[REDACTED REASONING]",
					"summary": []
				},
				{
					"type": "tool_use",
					"id": "tooluse_fcrpWLhzTH2FtAdl-KLPHg",
					"name": "execute_command",
					"input": {
						"command": "npm run test:stream assembleToolDefinitions",
						"cwd": "[REDACTED]"
					}
				}
			],
			"ts": 1767720174139
		},
		{
			"role": "user",
			"content": [
				{
					"type": "tool_result",
					"tool_use_id": "tooluse_fcrpWLhzTH2FtAdl-KLPHg",
					"content": "Command executed in terminal. Exit code: 1\nOutput:\n\n[TEST BANNER REDACTED]\n\nβœ… 12 passing (96ms)\n❌ 1 failing\n\n  1) [REDACTED TEST SUITE NAME]\n       Test suite failed to load because...:\n     TypeError: suite.default is not a function\n      at Suite.<anonymous> (test/parallel/route.ts:45:17)\n      at [REDACTED PATH]\n      at processTicksAndRejections (node:internal/process/task_queues:105:5)"
				},
				{
					"type": "text",
					"text": "<environment_details>\n# Current Time\nCurrent time in ISO 8601 UTC format: 2026-01-06T17:22:57.465Z\n# Current Cost\n$0.63\n</environment_details>"
				}
			],
			"ts": 1767720177466
		},
		{
			"role": "assistant",
			"content": [
				{
					"type": "reasoning",
					"text": "[REDACTED REASONING]",
					"summary": []
				},
				{
					"type": "text",
					"text": "I can see the issue! The error is `TypeError: suite.default is not a function` at `test/parallel/route.ts:45:17`. This suggests the test index file isn't exporting properly. Let me check the test files:"
				},
				{
					"type": "tool_use",
					"id": "tooluse_T9YVk0-sRZOwfgpuL57pLA",
					"name": "read_file",
					"input": {
						"files": [
							{
								"path": "[REDACTED TEST FILE PATH 1]"
							},
							{
								"path": "[REDACTED TEST FILE PATH 2]"
							}
						]
					}
				}
			],
			"ts": 1767720185141
		},
		{
			"role": "user",
			"content": [
				{
					"type": "tool_result",
					"tool_use_id": "tooluse_T9YVk0-sRZOwfgpuL57pLA",
					"content": "File: [REDACTED PATH 1]\nLines 1-12:\n[FILE CONTENT REDACTED - Test imports and describe block without default export]\n\nFile: [REDACTED PATH 2]\nLines 1-16:\n[FILE CONTENT REDACTED - Test imports and describe block without default export]"
				},
				{
					"type": "text",
					"text": "<environment_details>\n# Current Time\nCurrent time in ISO 8601 UTC format: 2026-01-06T17:23:05.263Z\n# Current Cost\n$0.69\n</environment_details>"
				}
			],
			"ts": 1767720185265
		},
		{
			"role": "assistant",
			"content": [
				{
					"type": "reasoning",
					"text": "[REDACTED REASONING]",
					"summary": []
				},
				{
					"type": "text",
					"text": "I can see the issue! The test index files are missing the `export default` statement. The test files just have a `describe` block but no export. Let me check what format is expected:"
				},
				{
					"type": "tool_use",
					"id": "tooluse_JHhlAIwpQnqGSg9CScbYUg",
					"name": "read_file",
					"input": {
						"files": [
							{
								"path": "test/parallel/route.ts"
							}
						]
					}
				}
			],
			"ts": 1767720192929
		},
		{
			"role": "user",
			"content": [
				{
					"type": "tool_result",
					"tool_use_id": "tooluse_JHhlAIwpQnqGSg9CScbYUg",
					"content": "File: test/parallel/route.ts\nLines 1-65:\n[FILE CONTENT REDACTED - Test runner that calls suite.default() at line 45]"
				},
				{
					"type": "text",
					"text": "<environment_details>\n# Current Time\nCurrent time in ISO 8601 UTC format: 2026-01-06T17:23:13.054Z\n# Current Cost\n$0.75\n</environment_details>"
				}
			],
			"ts": 1767720193055
		},
		{
			"role": "assistant",
			"content": [
				{
					"type": "reasoning",
					"text": "[REDACTED REASONING]",
					"summary": []
				},
				{
					"type": "text",
					"text": "I see the issue! The test runner expects a `default` export function at line 45: `suite.default()`. Let me check a working test to see the correct pattern:"
				},
				{
					"type": "tool_use",
					"id": "tooluse_Hj7A2ol0R8uvAwKKF96NxQ",
					"name": "read_file",
					"input": {
						"files": [
							{
								"path": "[REDACTED WORKING TEST PATH]"
							}
						]
					}
				}
			],
			"ts": 1767720200285
		},
		{
			"role": "user",
			"content": [
				{
					"type": "tool_result",
					"tool_use_id": "tooluse_Hj7A2ol0R8uvAwKKF96NxQ",
					"content": "File: [REDACTED PATH]\nLines 1-24:\n[FILE CONTENT REDACTED - Shows proper test structure with export default function wrapping describe block]"
				},
				{
					"type": "text",
					"text": "<environment_details>\n# Current Time\nCurrent time in ISO 8601 UTC format: 2026-01-06T17:23:20.609Z\n# Current Cost\n$0.81\n</environment_details>"
				}
			],
			"ts": 1767720200610
		},
		{
			"role": "assistant",
			"content": [
				{
					"type": "reasoning",
					"text": "[REDACTED REASONING]",
					"summary": []
				},
				{
					"type": "text",
					"text": "Perfect! I found the issue. The test index files need to export a default function that wraps the `describe` block. Let me fix both files:"
				},
				{
					"type": "tool_use",
					"id": "tooluse_ZncIxJaVQu2HGo9iHbIRcQ",
					"name": "apply_diff",
					"input": {
						"path": "[REDACTED TEST FILE PATH 1]",
						"diff": "[DIFF CONTENT REDACTED - Wraps describe block in export default function]"
					}
				}
			],
			"ts": 1767720210349
		},
		{
			"role": "user",
			"content": [
				{
					"type": "tool_result",
					"tool_use_id": "tooluse_ZncIxJaVQu2HGo9iHbIRcQ",
					"content": "{\"path\":\"[REDACTED]\",\"operation\":\"modified\",\"notice\":\"You do not need to re-read the file, as you have seen all changes Proceed with the task using these changes as the new baseline.\"}\n<notice>Making multiple related changes in a single apply_diff is more efficient. If other changes are needed in this file, please include them as additional SEARCH/REPLACE blocks.</notice>"
				},
				{
					"type": "text",
					"text": "<environment_details>\n# Current Time\nCurrent time in ISO 8601 UTC format: 2026-01-06T17:23:31.965Z\n# Current Cost\n$0.88\n# Recently Modified Files\n[REDACTED]\n</environment_details>"
				}
			],
			"ts": 1767720211966
		},
		{
			"role": "assistant",
			"content": [
				{
					"type": "reasoning",
					"text": "[REDACTED REASONING]",
					"summary": []
				},
				{
					"type": "tool_use",
					"id": "tooluse_TbE1v11GSHGqJeM8QecJQg",
					"name": "apply_diff",
					"input": {
						"path": "[REDACTED TEST FILE PATH 2]",
						"diff": "[DIFF CONTENT REDACTED - Wraps describe block in export default function]"
					}
				}
			],
			"ts": 1767720220516
		},
		{
			"role": "user",
			"content": [
				{
					"type": "tool_result",
					"tool_use_id": "tooluse_TbE1v11GSHGqJeM8QecJQg",
					"content": "{\"path\":\"[REDACTED]\",\"operation\":\"modified\",\"notice\":\"You do not need to re-read the file, as you have seen all changes Proceed with the task using these changes as the new baseline.\"}\n<notice>Making multiple related changes in a single apply_diff is more efficient. If other changes are needed in this file, please include them as additional SEARCH/REPLACE blocks.</notice>"
				},
				{
					"type": "text",
					"text": "<environment_details>\n# Current Time\nCurrent time in ISO 8601 UTC format: 2026-01-06T17:23:41.995Z\n# Current Cost\n$0.96\n# Recently Modified Files\n[REDACTED]\n</environment_details>"
				}
			],
			"ts": 1767720221998
		},
		{
			"role": "assistant",
			"content": [
				{
					"type": "reasoning",
					"text": "[REDACTED REASONING]",
					"summary": []
				},
				{
					"type": "text",
					"text": "Now let me test both files to verify they're fixed:"
				},
				{
					"type": "tool_use",
					"id": "tooluse_Tw2pVVAUQt2BcMkhxZQeMA",
					"name": "execute_command",
					"input": {
						"command": "npm test assembleToolDefinitions",
						"cwd": "[REDACTED]"
					}
				}
			],
			"ts": 1767720228481
		},
		{
			"role": "user",
			"content": [
				{
					"type": "tool_result",
					"tool_use_id": "tooluse_Tw2pVVAUQt2BcMkhxZQeMA",
					"content": "Command executed in terminal within working directory '[REDACTED]'. Exit code: 0\nOutput:\n\n[TEST OUTPUT REDACTED]\n\nβœ… 12 passed\n❌ 0 failed\n\nRan in: 2 seconds"
				},
				{
					"type": "text",
					"text": "<environment_details>\n# Current Time\nCurrent time in ISO 8601 UTC format: 2026-01-06T17:23:51.715Z\n# Current Cost\n$1.04\n</environment_details>"
				}
			],
			"ts": 1767720231716
		},
		{
			"role": "assistant",
			"content": [
				{
					"type": "reasoning",
					"text": "[REDACTED REASONING]",
					"summary": []
				},
				{
					"type": "text",
					"text": "Excellent! Now let me test the second file:"
				},
				{
					"type": "tool_use",
					"id": "tooluse_PbLZjpT1QZSfWtIIWnMF4Q",
					"name": "execute_command",
					"input": {
						"command": "npm test assembleToolExecutions",
						"cwd": "[REDACTED]"
					}
				}
			],
			"ts": 1767720237664
		},
		{
			"role": "user",
			"content": [
				{
					"type": "tool_result",
					"tool_use_id": "tooluse_PbLZjpT1QZSfWtIIWnMF4Q",
					"content": "Command executed in terminal within working directory '[REDACTED]'. Exit code: 0\nOutput:\n\n[TEST OUTPUT REDACTED]\n\nβœ… 21 passed\n❌ 0 failed\n\nRan in: 2 seconds"
				},
				{
					"type": "text",
					"text": "<environment_details>\n# Current Time\nCurrent time in ISO 8601 UTC format: 2026-01-06T17:24:00.871Z\n# Current Cost\n$1.09\n</environment_details>"
				}
			],
			"ts": 1767720240873
		},
		{
			"role": "user",
			"content": [
				{
					"type": "tool_result",
					"tool_use_id": "tooluse_PbLZjpT1QZSfWtIIWnMF4Q",
					"content": "{\"status\":\"approved\",\"message\":\"The user approved this operation and provided the following context\",\"feedback\":\"Oh right. These are actually very wrong. Look at '[REDACTED' (see below for file content) FILE PATH]\"}"
				},
				{
					"type": "text",
					"text": "<environment_details>\n# Current Time\nCurrent time in ISO 8601 UTC format: 2026-01-06T17:25:50.210Z\n# Current Cost\n$1.14\n# Current Workspace Directory ([REDACTED]) Files\n[EXTENSIVE FILE LIST REDACTED]\n</environment_details>"
				}
			],
			"ts": 1767720350244
		}
	]
}

Key Observation

The critical evidence is in the final two user messages for tool_use_id tooluse_PbLZjpT1QZSfWtIIWnMF4Q:

  1. First tool_result (timestamp: 1767720240873): Valid command output "βœ… 21 passed"
  2. Second tool_result (timestamp: 1767720350244): User feedback "Oh right. These are actually very wrong..."

The problem: Two tool_result blocks exist for the same tool_use_id, but there's only ONE tool_use block in the previous assistant message. This violates the API protocol requiring exactly one tool_result per tool_use.

The smoking gun: The timestamp gap (109 seconds) between the two tool_results indicates the user queued the feedback message while the command was running, then deleted it, but the system still added it as a second tool_result.


Root Cause Analysis

The Problem

A race condition between message queue dequeue and message deletion causes deleted queued message text to persist in task internal state (Task.askResponseText) and be incorrectly included as duplicate tool_result blocks. However, the validation function validateAndFixToolResultIds() fails to detect this because it only checks if a tool_result matches a tool_use in the previous assistant message, but doesn't check if that tool_use_id has already been paired with a tool_result in a previous user message in the conversation history.

Specific Mechanism

  1. Tool execution starts β†’ Agent waits for completion/approval
  2. User queues message β†’ Added to MessageQueueService._messages with unique ID
  3. Early dequeue (the race) β†’ While executing, Task.ask() finds queued message, dequeues it, stores text in Task.askResponseText (message gone from queue but text persists in task state)
  4. User deletes message β†’ removeQueuedMessage event sent, but message already dequeued, so removal is no-op. Critical gap: No cleanup of Task.askResponseText
  5. Tool completes β†’ presentAssistantMessage.ts reads Task.askResponseText (contains deleted message!), creates tool_result with deleted text
  6. First tool_result sent β†’ Valid command output, API call succeeds
  7. Second tool_result created (orphan) β†’ Deleted message text still in Task.askResponseText, creates second tool_result for same tool_use_id: Assistant[tool_use] β†’ User1[tool_result] β†’ User2[tool_result] ← ORPHAN!
  8. API rejection β†’ "2 tool_result blocks > 1 tool_use block", task enters unrecoverable state

Code Paths

Message Queue Dequeue:

// src/core/task/Task.ts (around line 1238-1256)
if (!this.messageQueueService.isEmpty()) {
    const message = this.messageQueueService.dequeueMessage()
    if (message) {
        // For tool approval, include queued text as feedback
        if (type === "tool" || type === "command" || ...) {
            this.handleWebviewAskResponse("yesButtonClicked", message.text, message.images)
            //                                                  ^^^^^^^^^^^^ Stored here!
        }
    }
}

State Persistence (The Symptom):

// src/core/task/Task.ts (around line 1291-1297)
handleWebviewAskResponse(askResponse: ClineAskResponse, text?: string, images?: string[]) {
    this.cancelAutoApprovalTimeout()
    this.askResponse = askResponse
    this.askResponseText = text         // ← PERSISTS even after message deleted!
    this.askResponseImages = images
    // ... rest of method
}

Message Deletion (No Cleanup):

// src/core/message-queue/MessageQueueService.ts (around line 54-64)
public removeMessage(id: string): boolean {
    const { index, message } = this.findMessage(id)
    if (!message) return false  // ← Already dequeued, returns false

    this._messages.splice(index, 1)
    this.emit("stateChanged", this._messages)
    return true
}
// ❌ Problem: Does NOT clear Task.askResponseText!

Tool Result Creation (Where Duplicate Appears):

// src/core/assistant-message/presentAssistantMessage.ts (around line 607-612)
if (text) {
	// ← text = Task.askResponseText = deleted message!
	await cline.say("user_feedback", text, images)
	pushToolResult(formatResponse.toolResult(formatResponse.toolApprovedWithFeedback(text, toolProtocol), images)) // ← Creates SECOND tool_result with deleted message
}

The Core Issue

The root cause is a state cleanup gap: Once a queued message is dequeued and stored in Task.askResponseText, there is no mechanism to clear it when the message is subsequently deleted from the UI. Message deletion only affects the MessageQueueService._messages array, not the task's internal state variables.

However, the immediate solution should be defense-in-depth validation to filter duplicates before they reach the API, as this protects against this issue and many other potential tool result edge cases.


Recommended Fix: Cross-Message Orphan Filtering (Defense-in-Depth)

Approach

Enhance validateAndFixToolResultIds() to track which tool_use_ids have already been paired with tool_result blocks in previous user messages in the conversation history, then filter out any tool_result in the current message that references a tool_use_id that's already been paired.

Key concept: Each tool_use can be paired with exactly ONE tool_result. Once a pairing exists in a previous user message, any additional tool_result blocks referencing that same tool_use_id are orphans that must be filtered out. This prevents orphaned tool_results from being submitted when they reference tool_uses that are no longer available (already paired in earlier messages).

Why This Approach

  1. Single-File Change: Only modifies src/core/task/validateToolResultIds.ts
  2. Defense-in-Depth: Protects against this issue AND other potential cross-message tool result edge cases
  3. Low Complexity: Clear logic that scans conversation history to build paired ID set
  4. Broad Benefit: Handles multiple scenarios where tool_results get separated across messages
  5. Non-Breaking: No impact on existing functionality, just adds safety

Implementation

File Modified: src/core/task/validateToolResultIds.ts

Location: Insert after finding the previous assistant message (around line 140, after collecting validToolUseIds)

// After finding the previous assistant message and extracting tool_use IDs...
const validToolUseIds = new Set(
	prevAssistantMessage.content
		.filter((block): block is Anthropic.ToolUseBlockParam => block.type === "tool_use")
		.map((toolUse) => toolUse.id),
)

// === NEW: Cross-message orphan filtering ===
// Scan backward through conversation history to find ALL previous user messages
// that came after the assistant message, and collect which tool_use_ids have
// already been paired with tool_results. This prevents orphaned tool_results from
// duplicate responses (e.g., queued message deletion, retry logic, etc.)

const prevAssistantIdx = apiConversationHistory.findIndex((msg) => msg === prevAssistantMessage)

// Get all messages that came after the assistant message (before current)
const messagesSinceAssistant = apiConversationHistory.slice(prevAssistantIdx + 1)

// Collect tool_use_ids that have already been paired with tool_results in previous user messages
const alreadyPairedToolUseIds = new Set<string>()
for (const msg of messagesSinceAssistant) {
	if (msg.role === "user" && Array.isArray(msg.content)) {
		const previousToolResults = msg.content.filter(
			(block): block is Anthropic.ToolResultBlockParam => block.type === "tool_result",
		)
		previousToolResults.forEach((tr) => alreadyPairedToolUseIds.add(tr.tool_use_id))
	}
}

// Now filter tool_results in current message to remove orphans
// A tool_result is an orphan if it references a tool_use that's already been paired
const filteredContent = userMessage.content.filter((block) => {
	if (block.type !== "tool_result") {
		return true // Keep all non-tool_result blocks
	}

	// Filter out tool_results that reference tool_uses which are already paired (no longer available)
	if (alreadyPairedToolUseIds.has(block.tool_use_id)) {
		return false // This is an orphan - tool_use already paired in previous message
	}

	return true // Keep this tool_result
})

// Update the user message with filtered content
userMessage = {
	...userMessage,
	content: filteredContent,
}

// Re-extract toolResults from filtered content for subsequent processing
const toolResults = filteredContent.filter(
	(block): block is Anthropic.ToolResultBlockParam => block.type === "tool_result",
)
// === END NEW CODE ===

// Rest of existing validation logic continues unchanged...

How This Fixes the Bug

Bug Scenario (Cross-Message):

Assistant Message 26: [tool_use(id="tooluse_PbLZjpT1QZSfWtIIWnMF4Q")]
User Message 27:     [tool_result(id="tooluse_PbLZjpT1QZSfWtIIWnMF4Q", content="βœ… 21 passed")] ← Valid pairing
User Message 28:     [tool_result(id="tooluse_PbLZjpT1QZSfWtIIWnMF4Q", content="deleted message")] ← ORPHAN! (tool_use already paired)

With This Fix:

  1. Validation runs before submitting user message 28
  2. Find previous assistant message (index 26)
  3. Scan forward from index 27 to collect tool_use_ids that have already been paired
  4. Find message 27 contains tool_result for tooluse_PbLZjpT1QZSfWtIIWnMF4Q
  5. Add tooluse_PbLZjpT1QZSfWtIIWnMF4Q to alreadyPairedToolUseIds set
  6. Filter message 28 content: tool_result for tooluse_PbLZjpT1QZSfWtIIWnMF4Q already in set β†’ filter out
  7. Result: Message 28 has no tool_results (or only other content if present) βœ…
  8. API receives valid message structure: 1 tool_use β†’ 1 tool_result (pairing complete)
  9. No error, task continues normally

Edge Cases Handled

  1. Multiple Different Tool Results (normal case)

    • Each has unique tool_use_id
    • All kept (correct behavior)
  2. Legitimate Duplicate IDs (shouldn't exist, but if they do)

    • First occurrence kept
    • Subsequent duplicates filtered
    • Prevents API error
  3. Mixed Content Types

    • Only tool_result blocks are checked
    • Text blocks, images, etc. always kept
  4. No Tool Results

    • Filter is no-op
    • No performance impact

Testing Strategy

Add to [src/core/task/__tests__/validateToolResultIds.spec.ts]:

describe("validateAndFixToolResultIds - Cross-Message Orphan Filtering", () => {
	it("should filter out tool_result in second user message that references already-paired tool_use", () => {
		const messages: Anthropic.Messages.MessageParam[] = [
			{
				role: "assistant",
				content: [
					{
						type: "tool_use",
						id: "tooluse_PbLZjpT1QZSfWtIIWnMF4Q",
						name: "execute_command",
						input: { command: "npm test" },
					},
				],
			},
			{
				role: "user",
				content: [
					{ type: "tool_result", tool_use_id: "tooluse_PbLZjpT1QZSfWtIIWnMF4Q", content: "βœ… 21 passed" },
				], // Valid pairing
			},
			{
				role: "user",
				content: [
					{ type: "tool_result", tool_use_id: "tooluse_PbLZjpT1QZSfWtIIWnMF4Q", content: "deleted message" },
				], // Orphan!
			},
		]

		const result = validateAndFixToolResultIds(messages)

		expect(result[1].content.filter((b: any) => b.type === "tool_result")).toHaveLength(1)
		expect(result[2].content.filter((b: any) => b.type === "tool_result")).toHaveLength(0) // Orphan filtered
	})

	it("should keep tool_results in separate messages if they reference different tool_uses", () => {
		// Test with tooluse_123 and tooluse_456 in separate messages - both should be kept
	})

	it("should preserve non-tool_result content when filtering orphaned tool_results", () => {
		// Test that text blocks remain when orphaned tool_result is filtered
	})

	it("should handle multiple tool_uses with mixed valid and orphaned results", () => {
		// Test filtering duplicate for one ID while keeping valid result for another ID
	})
})

Advantages: Surgical (~30 lines), clear intent, broad protection, no breaking changes, easy review, O(n) performance.

Impact: Before: API errors β†’ task failure. After: Orphans filtered β†’ task continues. Bonus: protects against other cross-message scenarios.


Additional Notes

Why Existing Validation Doesn't Catch This: Current validateAndFixToolResultIds() checks if tool_result's tool_use_id exists in previous assistant message, but doesn't check if that tool_use_id was already paired in a previous user message. The fix adds cross-message orphan tracking by scanning conversation history.

Similar Issue: Terminal Fallback (#10465) - external terminal fails β†’ system retries β†’ user approval arrives during fallback β†’ duplicate tool_results. Same pattern: native protocol's strict validation exposed latent race conditions previously tolerated under XML protocol.


Historical Context

Type: Latent bug exposed by architectural change

Timeline: Message queue dequeue logic existed in codebase for extended period (2024-2025), became problematic on December 18, 2025 (v3.36.13) when native tool protocol became default.

Why Surface Now?

  • XML protocol (pre-Dec 2025): No strict 1:1 tool_use/tool_result enforcement, duplicates tolerated, race condition invisible
  • Native protocol (post-Dec 2025): Enforces exactly one tool_result per tool_use_id, API validates and rejects mismatches, race condition became fatal

Related Issues: Native tool rollout (Nov-Dec 2025) exposed multiple tool result edge cases including PR #9248, #9363, #9952, #10015, #10027, #10021, #10186 (the trigger), and #10465 (similar terminal fallback issue).

Key Insight: Native protocol's strict validation exposed latent race conditions in tool result generation that were previously invisible under XML protocol. Validation layer defense-in-depth is key protection.


Impact Assessment

  • Severity: High - immediate task failure, no recovery, affects queued message workflows
  • Frequency: Low-Medium - requires specific timing (queue during execution, delete before completion)
  • Fix Complexity: Low - single-file change (~30 lines), no architectural refactoring, clear tests, easy review

Conclusion

Race condition in message queue state management causes deleted message text to persist in Task.askResponseText. Combined with validation gap (doesn't track already-paired tool_use_ids across messages), this creates duplicate tool_results.

Recommended fix: Cross-message orphan filtering enhancement to validateAndFixToolResultIds() - scans history, filters tool_results referencing already-paired tool_uses. Single-file change (~30 lines), immediate protection, handles cross-message tool result scenarios. State cleanup on message deletion should be addressed separately.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Issue/PR - TriageNew issue. Needs quick review to confirm validity and assign labels.bugSomething isn't working

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions