fix(self-healing): oscillation guard prevents A→B→A→A patch loops#221
fix(self-healing): oscillation guard prevents A→B→A→A patch loops#221Gradata wants to merge 1 commit into
Conversation
Real-world bug observed 2026-05-21 on production brain: lesson 911130b3 oscillated between two rule phrasings A and B for 5 consecutive rollbacks spanning 20 days, each marked '100% reduction' in the dashboard. Root cause: `observe_patch()` had no cycle detection — every time the compliance scorer flagged the current text as 'failing,' the patcher rewrote it back to the previous text without checking it had just patched away from that text. The 'reduction' metric games itself: the new text trivially shows zero failures because it has zero observations yet. Fix: new module `_oscillation_guard.py` is consulted before each `observe_patch()` call. Detects direct A→B then B→A cycles within a 30-day / 5-patch lookback window. On detection, emits a `rule_patch_cycle_detected` event and aborts the patch instead of recording another fake-reduction row. Conservative scope (only direct cycles, only within a single category, whitespace-normalized comparison) to minimize false-positive risk. The next sibling fix (`recurrence_change → insufficient_data when <3 sessions`) is filed as a separate cloud-side issue [040a09dd]. Tests: 12 new (oscillation_guard) + 166 existing self-healing/patch tests still green. Refs: paperclip issue 1983a5c6
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
📝 Walkthrough
WalkthroughThis PR introduces oscillation prevention and patch telemetry for rule self-improvement. A new ChangesRule Oscillation Guard and Patch Telemetry
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 OpenGrep (1.21.0)OpenGrep fatal error (exit code 2): �[32m✔�[39m �[1mOpengrep OSS�[0m �[1m Loading rules from local config...�[0m Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
Gradata/tests/test_oscillation_guard.py (1)
194-232: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winAdd a regression test for one-time resolution semantics.
Please add a deterministic unit test that calls
resolve_patch_compliance()twice and asserts the same original patch is not resolved/emitted twice.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Gradata/tests/test_oscillation_guard.py` around lines 194 - 232, Add a deterministic unit test that uses _FakeBrain to simulate a patch, calls resolve_patch_compliance(brain, <rule_id>, <old>, <new>) twice, and asserts the first call returns/emits the expected "rule_patch_resolved" (or appropriate resolved event) while the second call does not emit a duplicate resolution (e.g., returns None or an event with a different type), thereby verifying one-time resolution semantics; locate resolve_patch_compliance, _FakeBrain and observe_patch in the test file to mirror setup used by existing tests (create the same initial patch before calling resolve_patch_compliance twice and assert identical inputs produce only a single resolved event).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@Gradata/src/gradata/enhancements/self_improvement/_oscillation_guard.py`:
- Around line 114-120: The try/except around brain.query_events in
_oscillation_guard.py currently swallows exceptions and returns None; update the
exception handlers (the block around brain.query_events and the similar block at
lines ~189-190) to keep the fail-open behavior but log the exception context by
calling logger.warning with a descriptive message and exc_info=True (e.g.,
"cycle guard: failed to query events, failing open") so outages are diagnosable;
locate and modify the handlers around brain.query_events and the second silent
except to replace bare excepts with logging that includes exc_info=True while
still returning None.
In `@Gradata/src/gradata/enhancements/self_improvement/_patches.py`:
- Around line 37-44: The try/except around the call to brain.query_events
currently swallows exceptions and returns 0; replace the bare except with a log
call that preserves the existing fallback behavior by calling
logger.warning("Failed to query events", exc_info=True) (or similar) before
returning 0 so telemetry failures are observable. Do the same change for the
other bare except blocks in this module that return fallbacks (the ones around
the other brain.* calls) — add a logger.warning with exc_info=True in each
except and keep the original return value. Ensure you reference the same symbols
(e.g., brain.query_events and the other brain.* calls) and do not change the
fallback logic.
- Around line 60-66: The session detection currently swallows all errors with a
bare except, which hides failures; modify the try/except around
brain.query_events(...) and the subsequent max(...) call to catch Exception as e
(or narrower exceptions if known), log the error with the module logger (e.g.,
logger.warning("session detection failed", exc_info=True) or similar) and then
return the default 1; reference the brain.query_events call and the
events/max((e.get("session") ...) expression when making the change so you only
replace the bare except: pass with a typed exception handler that logs
exc_info=True before returning 1.
- Around line 154-193: The loop that resolves patches can run multiple times
because the original source event (ev) is never updated with
"observed_compliance_after_3_sessions", so add an atomic update/write to the
original event before emitting "rule_patch_observed": after computing
compliance_after (in the pending_events loop where ev, data, compliance_after
and compliance_before are computed) mutate/persist the source event's data to
set "observed_compliance_after_3_sessions" (and optionally "resolution_session"
or "compliance_improved") using the same event persistence API your codebase
uses, then call brain.emit("rule_patch_observed",
"_patches.resolve_patch_compliance", ...). This ensures the guard
data.get("observed_compliance_after_3_sessions") will be present on subsequent
runs and prevents duplicate "resolved" emissions; ensure the update is performed
before or as part of the emit so retries do not produce duplicates.
---
Outside diff comments:
In `@Gradata/tests/test_oscillation_guard.py`:
- Around line 194-232: Add a deterministic unit test that uses _FakeBrain to
simulate a patch, calls resolve_patch_compliance(brain, <rule_id>, <old>, <new>)
twice, and asserts the first call returns/emits the expected
"rule_patch_resolved" (or appropriate resolved event) while the second call does
not emit a duplicate resolution (e.g., returns None or an event with a different
type), thereby verifying one-time resolution semantics; locate
resolve_patch_compliance, _FakeBrain and observe_patch in the test file to
mirror setup used by existing tests (create the same initial patch before
calling resolve_patch_compliance twice and assert identical inputs produce only
a single resolved event).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 80abbafa-c945-4493-8d1d-a53f115c5f64
📒 Files selected for processing (3)
Gradata/src/gradata/enhancements/self_improvement/_oscillation_guard.pyGradata/src/gradata/enhancements/self_improvement/_patches.pyGradata/tests/test_oscillation_guard.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: pytest ubuntu-latest / py3.11
- GitHub Check: pytest macos-latest / py3.11
- GitHub Check: pytest ubuntu-latest / py3.12
- GitHub Check: pytest macos-latest / py3.12
- GitHub Check: pytest windows-latest / py3.11
- GitHub Check: pytest windows-latest / py3.12
- GitHub Check: pytest (py3.11)
- GitHub Check: pytest (py3.12)
🧰 Additional context used
📓 Path-based instructions (2)
Gradata/src/**/*.py
📄 CodeRabbit inference engine (Gradata/AGENTS.md)
Gradata/src/**/*.py: Prefersentence-transformersfor local embeddings,google-genaifor Gemini embeddings,cryptographyfor AES-GCM encrypted system.db,bm25sfor BM25 rule ranking, andmem0aifor external memory adapters — guard all optional dependency imports withtry / except ImportErrorat the call site, never at module level
Maintain strict layering: Layer 0 (Primitives: _types.py, _db.py, _events.py, _paths.py, _file_lock.py; Patterns: contrib/patterns/) must never import from Layer 1 (Enhancements: enhancements/, rules/) or Layer 2 (Public API: brain.py, cli.py, daemon.py, mcp_server.py)
Never use bareexcept: pass— use typed exceptions or at minimumlogger.warning(...)withexc_info=Trueto avoid silent failure in a memory product
Never import from out-of-scope sibling directories../Sprites/or../Hausgem/withingradata/*code — that is a layering bug
Never leak private-sibling paths into public docs/code — no references to../Sprites/,../Hausgem/, email addresses, OneDrive paths, or Sprites-specific examples from insidegradata/*
Use atomic-write helper when writing JSON files to prevent corruption from mid-write crashes
Files:
Gradata/src/gradata/enhancements/self_improvement/_patches.pyGradata/src/gradata/enhancements/self_improvement/_oscillation_guard.py
Gradata/tests/**/*.py
📄 CodeRabbit inference engine (Gradata/AGENTS.md)
Gradata/tests/**/*.py: SetBRAIN_DIRenvironment variable viatmp_pathin conftest.py for test isolation — ensure_paths.pymodule cache refreshes when callingBrain.init()directly inside tests
Add unit tests intests/test_*.pyfor every CI push without LLM calls (deterministic); mark integration tests with@pytest.mark.integrationand skip them by default (they hit real LLM APIs)
Files:
Gradata/tests/test_oscillation_guard.py
| try: | ||
| events = brain.query_events( | ||
| event_type="rule_patch_observed", | ||
| limit=200, | ||
| ) | ||
| except Exception: | ||
| return None # Fail open — never block patches on a query failure. |
There was a problem hiding this comment.
Log exception context when failing open.
Line 119 and Line 189 suppress errors silently. Keep fail-open behavior, but add logger.warning(..., exc_info=True) so cycle-guard outages are diagnosable in production.
Proposed fix
from __future__ import annotations
import os
+import logging
from datetime import UTC, datetime, timedelta
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from gradata.brain import Brain
+
+logger = logging.getLogger(__name__)
@@
- except Exception:
- return None # Fail open — never block patches on a query failure.
+ except Exception:
+ logger.warning("detect_cycle query failed; failing open", exc_info=True)
+ return None # Fail open — never block patches on a query failure.
@@
- except Exception:
- return {}
+ except Exception:
+ logger.warning("emit_cycle_detected failed", exc_info=True)
+ return {}As per coding guidelines, “Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product”.
Also applies to: 189-190
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Gradata/src/gradata/enhancements/self_improvement/_oscillation_guard.py`
around lines 114 - 120, The try/except around brain.query_events in
_oscillation_guard.py currently swallows exceptions and returns None; update the
exception handlers (the block around brain.query_events and the similar block at
lines ~189-190) to keep the fail-open behavior but log the exception context by
calling logger.warning with a descriptive message and exc_info=True (e.g.,
"cycle guard: failed to query events, failing open") so outages are diagnosable;
locate and modify the handlers around brain.query_events and the second silent
except to replace bare excepts with logging that includes exc_info=True while
still returning None.
| try: | ||
| events = brain.query_events( | ||
| event_type="RULE_FAILURE", | ||
| last_n_sessions=lookback_sessions, | ||
| limit=500, | ||
| ) | ||
| except Exception: | ||
| return 0 |
There was a problem hiding this comment.
Add warning logs for swallowed telemetry exceptions.
These paths currently fail silently and return fallback values. Add warning logs with exc_info=True so production telemetry failures are observable.
As per coding guidelines, “Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product”.
Also applies to: 128-129, 147-149, 192-193
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Gradata/src/gradata/enhancements/self_improvement/_patches.py` around lines
37 - 44, The try/except around the call to brain.query_events currently swallows
exceptions and returns 0; replace the bare except with a log call that preserves
the existing fallback behavior by calling logger.warning("Failed to query
events", exc_info=True) (or similar) before returning 0 so telemetry failures
are observable. Do the same change for the other bare except blocks in this
module that return fallbacks (the ones around the other brain.* calls) — add a
logger.warning with exc_info=True in each except and keep the original return
value. Ensure you reference the same symbols (e.g., brain.query_events and the
other brain.* calls) and do not change the fallback logic.
| try: | ||
| events = brain.query_events(limit=10) | ||
| if events: | ||
| return max((e.get("session") or 0) for e in events) | ||
| except Exception: | ||
| pass | ||
| return 1 |
There was a problem hiding this comment.
Replace bare except: pass in session detection.
Line 64-65 uses a silent bare pass, which hides failures in compliance resolution scheduling.
Proposed fix
+import logging
@@
+logger = logging.getLogger(__name__)
@@
- except Exception:
- pass
+ except Exception:
+ logger.warning("_get_current_session failed; defaulting to session=1", exc_info=True)
return 1As per coding guidelines, “Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product”.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Gradata/src/gradata/enhancements/self_improvement/_patches.py` around lines
60 - 66, The session detection currently swallows all errors with a bare except,
which hides failures; modify the try/except around brain.query_events(...) and
the subsequent max(...) call to catch Exception as e (or narrower exceptions if
known), log the error with the module logger (e.g., logger.warning("session
detection failed", exc_info=True) or similar) and then return the default 1;
reference the brain.query_events call and the events/max((e.get("session") ...)
expression when making the change so you only replace the bare except: pass with
a typed exception handler that logs exc_info=True before returning 1.
| for ev in pending_events: | ||
| data = ev.get("data", {}) | ||
| if data.get("observed_compliance_after_3_sessions") is not None: | ||
| continue | ||
|
|
||
| patch_session = ev.get("session") or 0 | ||
| if current_session - patch_session < min_session_gap: | ||
| continue | ||
|
|
||
| category = data.get("category", "") | ||
| new_rule_text = data.get("new_rule_text", "") | ||
|
|
||
| compliance_after = _count_failures_for_rule(brain, category, new_rule_text) | ||
| compliance_before = data.get("observed_compliance_before") or 0 | ||
| improved = compliance_after < compliance_before | ||
|
|
||
| try: | ||
| updated = brain.emit( | ||
| "rule_patch_observed", | ||
| "_patches.resolve_patch_compliance", | ||
| { | ||
| **data, | ||
| "observed_compliance_after_3_sessions": compliance_after, | ||
| "compliance_improved": improved, | ||
| "resolution_session": current_session, | ||
| "original_event_id": ev.get("id"), | ||
| }, | ||
| [f"category:{category}", "self_healing", _PATCH_TAG, "resolved"], | ||
| ) | ||
| updates.append( | ||
| { | ||
| "category": category, | ||
| "compliance_before": compliance_before, | ||
| "compliance_after": compliance_after, | ||
| "improved": improved, | ||
| "event": updated if isinstance(updated, dict) else {}, | ||
| } | ||
| ) | ||
| except Exception: | ||
| continue |
There was a problem hiding this comment.
Make compliance resolution idempotent to prevent duplicate “resolved” emissions.
The unresolved source event is never mutated, so repeated runs can resolve the same patch multiple times. This will skew acceptance telemetry and create duplicate audit rows.
Proposed fix
def resolve_patch_compliance(
@@
- current_session = _get_current_session(brain)
+ current_session = _get_current_session(brain)
+ already_resolved_ids = {
+ (e.get("data", {}) or {}).get("original_event_id")
+ for e in pending_events
+ if (e.get("data", {}) or {}).get("observed_compliance_after_3_sessions") is not None
+ }
@@
for ev in pending_events:
data = ev.get("data", {})
if data.get("observed_compliance_after_3_sessions") is not None:
continue
+ if ev.get("id") in already_resolved_ids:
+ continue
@@
try:
updated = brain.emit(🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Gradata/src/gradata/enhancements/self_improvement/_patches.py` around lines
154 - 193, The loop that resolves patches can run multiple times because the
original source event (ev) is never updated with
"observed_compliance_after_3_sessions", so add an atomic update/write to the
original event before emitting "rule_patch_observed": after computing
compliance_after (in the pending_events loop where ev, data, compliance_after
and compliance_before are computed) mutate/persist the source event's data to
set "observed_compliance_after_3_sessions" (and optionally "resolution_session"
or "compliance_improved") using the same event persistence API your codebase
uses, then call brain.emit("rule_patch_observed",
"_patches.resolve_patch_compliance", ...). This ensures the guard
data.get("observed_compliance_after_3_sessions") will be present on subsequent
runs and prevents duplicate "resolved" emissions; ensure the update is performed
before or as part of the emit so retries do not produce duplicates.
Problem
Real-world bug observed 2026-05-21 on production brain: lesson
911130b3showed 5 consecutive ROLLBACK rows on the Self-Healing dashboard ping-ponging between exactly two rule texts (see screenshot):Each rollback claimed "-62 (100% reduction)" — mathematically meaningless because the new text has zero observations at the moment of measurement.
Root cause
_patches.py::observe_patch()had no cycle detection. Every time the compliance scorer flagged the current rule as "failing this session," the patcher rewrote it back to the previous text. The "reduction %" metric games itself: the new text trivially shows zero failures because it hasn't been observed yet.Fix
New module
enhancements/self_improvement/_oscillation_guard.py:detect_cycle()checks the last Nrule_patch_observedevents (default 5, 30-day window) for any prior patch wherenew_text == proposed_old_text AND old_text == proposed_new_text— i.e. a direct A→B then B→A cycle.rule_patch_cycle_detectedevent with the matched prior event ID + a recommended lock duration (10 sessions, configurable).observe_patch()now consults the guard before recording the patch. If a cycle is detected, norule_patch_observedevent is emitted and the dashboard's Self-Healing tab gets a singlecycle_detectedrow instead of another fake-reduction entry.Out of scope (separate issues)
cloud/app/routes/rule_patches.py::_recurrence_changeshould returninsufficient_datawhen sessions_observed < 3 (filed as [040a09dd])Tests
tests/test_oscillation_guard.pycovering empty history, direct cycle, self-identity, category isolation, whitespace normalization, lookback expiry, and the integration path throughobserve_patch().Verification
Closes: paperclip issue 1983a5c6 (BUG-P0: self-healing oscillates)