Skip to content

WIP: heartbeat 'challenge' action — anti-drift audit of shared memory#70

Draft
BobbyZhouZijian wants to merge 2 commits intomainfrom
feat/heartbeat-challenge
Draft

WIP: heartbeat 'challenge' action — anti-drift audit of shared memory#70
BobbyZhouZijian wants to merge 2 commits intomainfrom
feat/heartbeat-challenge

Conversation

@BobbyZhouZijian
Copy link
Copy Markdown
Collaborator

Status: WIP / draft

Skeleton only. Prompt + registration + tests. No active counter-attempts — review-half only, per maintainer scoping decision.

Refs #67 (shared-memory drift / memory poisoning)
Refs #49 (Coyote / mandatory adversarial dissent)

Why

#67 raises that shared notes/skills can quietly homogenize agents over long runs (one agent's hallucination becomes group doctrine). #49 proposes a heavier governance stack but the one well-grounded primitive in it — a mandatory adversarial pass — fits CORAL's existing heartbeat machinery without buying the rest of that framework.

This PR adds that primitive and stops there.

What this PR does

  1. New prompt template coral/hub/prompts/challenge.md instructing an agent to:

    • Identify the highest-traffic notes/skills.
    • Adversarially try to falsify each (evidence check, generalization check, staleness check, counter-search against top attempts that didn't follow it).
    • Re-classify, do not delete — set status: validated | hypothesis | stale | disputed so future agents see what was once believed.
    • Append a dated entry to challenge_log.md summarizing the audit.
    • Hand back without changing the agent's current strategy.
  2. Register challenge in coral/hub/heartbeat.py:

    • DEFAULT_GLOBAL["challenge"] = True — population-level concern, one pass per run is enough.
    • DEFAULT_TRIGGER["challenge"] = "plateau" — drift matters when scores stop improving, not on a fixed clock.
  3. Add to default heartbeat list in coral/config.py: every=10, is_global=True, trigger="plateau".

  4. Tests asserting registration and default-config wiring.

Why these defaults

  • Plateau trigger, not interval — adversarial review pays off when the system has stopped finding gains. On a healthy upward trajectory it would just burn turns.
  • Global, not per-agent — sycophantic convergence is a population property. One challenger pass across the run is the right granularity.
  • Re-classify, not delete — preserves the institutional record of what was once believed, so future agents can re-evaluate rather than relearn from scratch.
  • Distinct from lint_wikilint_wiki does janitorial work (merge duplicates, fix orphans). challenge questions whether the surviving content is true. The prompt explicitly calls this out.

Out of scope (deliberately)

The "active" half of the anti-drift design — running attempts that deliberately violate consensus (a agents.challenger_fraction knob, or a sharing-disabled agent slot) — is not in this PR. Heartbeats interrupt an existing agent rather than spawn a divergent one, so that piece belongs to agent-spawning, not heartbeat. Tracked for a follow-up.

Execution plan (for reviewers / next steps)

This PR is the skeleton; landing the full feature needs:

  • Prompt + registration (this PR)
  • Default-config entry (this PR)
  • Tests for registration + config defaults (this PR)
  • Field test on a real run — verify the prompt produces useful re-classifications, not noise. Best validated on a long-running task config from examples/.
  • Decide whether challenge_log.md should be schema'd (YAML frontmatter per entry) so the UI can render it; currently free-form markdown.
  • Consider a coral notes --status disputed filter so disputed notes are easy to find.
  • Active half: agents.challenger_fraction config knob to spawn a fraction of agents in sharing-disabled "challenger" mode. Separate PR.
  • Eval whether every=10 plateau threshold is reasonable across task types or needs to be task-specific.

Test plan

  • uv run pytest tests/test_heartbeat.py tests/test_config.py -v — 33 passed
  • uv run pytest tests/ — 119 passed
  • uv run ruff check on touched files — clean
  • Smoke test on an actual coral start run to confirm the action surfaces in coral heartbeat listing and fires after a plateau.

🤖 Generated with Claude Code

Adds a plateau-triggered, global-scope heartbeat action that audits the
notes/skills shared memory for unsupported assumptions, stale claims, and
one-off skills that have been promoted to "common knowledge".

Scope is intentionally narrow (review only): the action re-classifies
shared content (validated/hypothesis/stale/disputed) and appends to
challenge_log.md. It does NOT spawn counter-attempts or change the agent's
current strategy — that "active" half is left for a follow-up.

Skeleton only:
- coral/hub/prompts/challenge.md   — adversarial-audit prompt
- coral/hub/heartbeat.py           — register in DEFAULT_PROMPTS / GLOBAL / TRIGGER
- coral/config.py                  — ship in default heartbeat list (every=10, plateau, global)
- tests/test_heartbeat.py          — registration + default-config tests

Refs #67 (memory poisoning / shared-memory drift)
Refs #49 (Coyote / mandatory adversarial dissent)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 25, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
coral Ready Ready Preview, Comment Apr 25, 2026 1:43pm

Request Review

Drift can accumulate during a healthy upward run, not only when scores
plateau — one note becoming consensus on dimension A doesn't show up as a
score regression if scores are still climbing on dimension B. A plateau
trigger therefore audits too late; interval gives a predictable cadence
that catches drift before it locks in.

The audit is review-only (no counter-attempts) so running it on a fixed
cadence is cheap.

- DEFAULT_TRIGGER["challenge"] = "interval"
- config default: every=10, is_global=True (no explicit trigger; "interval" is the default)
- prompt opening reframed: not plateau-specific
- tests updated

Refs #67, #49

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant