feat(core): enhance loop detection with thought and action stagnation checks#1
feat(core): enhance loop detection with thought and action stagnation checks#1euxaristia wants to merge 5 commits intomainfrom
Conversation
When a tool call repeatedly fails schema validation with the same error (e.g., ask_user_question with malformed params), the model enters an infinite retry loop burning tokens. After 3 consecutive validation failures for the same tool, inject a strong stop directive into the error response telling the model to change approach. The retry counter resets when: - Any tool call succeeds (clears all counters) - A different tool is called (clears previous tool's counter) Adds unit tests for the retry detection behavior. Fixes the class of loop where the model repeatedly generates the same malformed tool call parameters and never recovers.
EAGAIN (resource temporarily unavailable) is a transient non-blocking read error from node-pty that should be treated as expected rather than crashing the process. Also fix the settings schema generation script that failed under Bun by using node --import tsx/esm instead of npx tsx.
Addressed all feedback from PR QwenLM#3178: 1. Fixed validation retry counter keying to use both tool name and error message to prevent different validation errors on the same tool from accumulating 2. Improved EAGAIN error handling to only suppress read-related EAGAIN errors rather than all EAGAIN errors globally 3. Fixed validation retry counter reset behavior to only clear counters for the specific tool that succeeded, not all counters
Prevents Rust build artifacts from being committed in the future, specifically: - packages/sdk-rust/target/ directory containing build outputs - Cargo.lock file (if not meant to be version controlled) This addresses the issue where PRs became bloated with temporary build files.
… checks Enhance the LoopDetectionService with additional loop detection patterns to catch more subtle looping behaviors that were previously undetected. New Loop Detection Patterns: - Repetitive Thoughts: Detects when the model produces the same thought patterns repeatedly, indicating cognitive loops - Read File Loop: Detects excessive file read operations without meaningful progress - Action Stagnation: Tracks turns without meaningful action progress, catching subtle loops where the model keeps performing different but equally unproductive actions Improvements: - Added REPETITIVE_THOUGHTS, READ_FILE_LOOP, and ACTION_STAGNATION to the LoopType enum in telemetry types - Enhanced LoopDetectionService with thought tracking and action stagnation counters - Improved test coverage for new detection patterns
There was a problem hiding this comment.
Code Review
This pull request introduces several enhancements to loop detection and error handling, including new detection patterns for repetitive thoughts, file read loops, and action stagnation. It also improves PTY error handling and updates build scripts. However, several issues were identified in the implementation of the validation retry logic and loop detection services. Specifically, there are logic errors in how retry counts are reset and tracked due to key format mismatches, hardcoded thresholds that ignore defined constants, and redundant class members that are either unused or unnecessarily complex. Addressing these points will improve the reliability and maintainability of the new loop detection features.
| switch (newStatus) { | ||
| case 'success': { | ||
| // Successful execution only resets retry state for this tool | ||
| this.validationRetryCounts.delete(currentCall.request.name); |
There was a problem hiding this comment.
The logic to reset validation retry counts on successful execution is incorrect. The key used is currentCall.request.name, but the keys in validationRetryCounts are in the format toolName:errorMessage. This delete operation will have no effect.
To correctly clear all validation errors for a successfully executed tool, you should iterate over the map and remove all entries that start with the tool's name.
for (const key of this.validationRetryCounts.keys()) {
if (key.startsWith(`${currentCall.request.name}:`)) {
this.validationRetryCounts.delete(key);
}
}| const prevTool = this.validationRetryCounts.keys().next().value; | ||
| const hasPrevFailingTool = requestsToProcess.some( | ||
| (r) => r.name === prevTool, | ||
| ); | ||
| if (!hasPrevFailingTool) { | ||
| this.validationRetryCounts.clear(); | ||
| } |
There was a problem hiding this comment.
The logic to detect if a batch of tool calls continues a validation loop is flawed. prevTool is a string in the format toolName:errorMessage, but it's being compared directly with r.name, which is just the tool name. This comparison will always be false. Consequently, validationRetryCounts will be cleared incorrectly as soon as a tool call request arrives that doesn't exactly match the full error key string of a previous failure.
const prevFailingToolNames = new Set(
[...this.validationRetryCounts.keys()].map((key) => key.split(':')[0]),
);
const hasPrevFailingTool = requestsToProcess.some((r) =>
prevFailingToolNames.has(r.name),
);
if (!hasPrevFailingTool) {
this.validationRetryCounts.clear();
}| if (count >= 3) { | ||
| // Inject strong stop directive after 3 consecutive failures with same error | ||
| const stopDirective = | ||
| '\n⚠️ RETRY LOOP DETECTED: This tool call has failed validation multiple times with the same error. STOP retrying the same approach and try a fundamentally different strategy or ask for help.'; | ||
| finalError = new Error(`${baseError.message}${stopDirective}`); | ||
| } |
There was a problem hiding this comment.
There are a couple of maintainability issues here:
- The threshold
3is hardcoded. You've definedVALIDATION_RETRY_LOOP_THRESHOLDat the top of the file, which should be used here instead. - A local
stopDirectiveconstant is defined with a message that is slightly different from the globalRETRY_LOOP_STOP_DIRECTIVE. This is redundant. You should use the global constant for consistency and easier maintenance.
if (count >= VALIDATION_RETRY_LOOP_THRESHOLD) {
// Inject strong stop directive after 3 consecutive failures with same error
finalError = new Error(`${baseError.message}${RETRY_LOOP_STOP_DIRECTIVE}`);
}| private repetitiveThoughtCount = 0; | ||
|
|
||
| // File read tracking | ||
| private fileReadCount = 0; |
There was a problem hiding this comment.
The class member this.fileReadCount is redundant. It's incremented on every read-like tool call (line 404) but it's never used for loop detection; checkReadFileLoop correctly calculates the count within a sliding window of recent tool calls. This unused member can cause confusion and should be removed from the class, from the trackToolCall method (line 404), and from the reset method (line 475).
| if (repetitiveCount >= THOUGHT_REPEAT_THRESHOLD) { | ||
| this.repetitiveThoughtCount++; | ||
| if (this.repetitiveThoughtCount >= 1) { // Trigger immediately on detection | ||
| logLoopDetected( | ||
| this.config, | ||
| new LoopDetectedEvent( | ||
| LoopType.REPETITIVE_THOUGHTS, | ||
| this.promptId, | ||
| ), | ||
| ); | ||
| return true; | ||
| } | ||
| } else { | ||
| this.repetitiveThoughtCount = 0; | ||
| } |
There was a problem hiding this comment.
The logic in checkRepetitiveThoughts can be simplified. The this.repetitiveThoughtCount variable is incremented and then immediately checked if it's >= 1, which is always true. The variable is not necessary since the function returns immediately upon detecting a loop.
You can remove this.repetitiveThoughtCount from the class (line 61) and reset() method (line 474), and simplify this logic.
if (repetitiveCount >= THOUGHT_REPEAT_THRESHOLD) {
logLoopDetected(
this.config,
new LoopDetectedEvent(
LoopType.REPETITIVE_THOUGHTS,
this.promptId,
),
);
return true;
}
This PR adds new loop detection patterns to catch more subtle looping behaviors that were previously undetected, replacing the functionality intended for the closed PR QwenLM#3176. This includes repetitive thoughts detection, read file loop detection, and action stagnation detection.