WIP: heartbeat 'challenge' action — anti-drift audit of shared memory by BobbyZhouZijian · Pull Request #70 · Human-Agent-Society/CORAL

BobbyZhouZijian · 2026-04-25T13:40:56Z

Status: WIP / draft

Skeleton only. Prompt + registration + tests. No active counter-attempts — review-half only, per maintainer scoping decision.

Refs #67 (shared-memory drift / memory poisoning)
Refs #49 (Coyote / mandatory adversarial dissent)

Why

#67 raises that shared notes/skills can quietly homogenize agents over long runs (one agent's hallucination becomes group doctrine). #49 proposes a heavier governance stack but the one well-grounded primitive in it — a mandatory adversarial pass — fits CORAL's existing heartbeat machinery without buying the rest of that framework.

This PR adds that primitive and stops there.

What this PR does

New prompt template coral/hub/prompts/challenge.md instructing an agent to:
- Identify the highest-traffic notes/skills.
- Adversarially try to falsify each (evidence check, generalization check, staleness check, counter-search against top attempts that didn't follow it).
- Re-classify, do not delete — set status: validated | hypothesis | stale | disputed so future agents see what was once believed.
- Append a dated entry to challenge_log.md summarizing the audit.
- Hand back without changing the agent's current strategy.
Register challenge in coral/hub/heartbeat.py:
- DEFAULT_GLOBAL["challenge"] = True — population-level concern, one pass per run is enough.
- DEFAULT_TRIGGER["challenge"] = "plateau" — drift matters when scores stop improving, not on a fixed clock.
Add to default heartbeat list in coral/config.py: every=10, is_global=True, trigger="plateau".
Tests asserting registration and default-config wiring.

Why these defaults

Plateau trigger, not interval — adversarial review pays off when the system has stopped finding gains. On a healthy upward trajectory it would just burn turns.
Global, not per-agent — sycophantic convergence is a population property. One challenger pass across the run is the right granularity.
Re-classify, not delete — preserves the institutional record of what was once believed, so future agents can re-evaluate rather than relearn from scratch.
Distinct from lint_wiki — lint_wiki does janitorial work (merge duplicates, fix orphans). challenge questions whether the surviving content is true. The prompt explicitly calls this out.

Out of scope (deliberately)

The "active" half of the anti-drift design — running attempts that deliberately violate consensus (a agents.challenger_fraction knob, or a sharing-disabled agent slot) — is not in this PR. Heartbeats interrupt an existing agent rather than spawn a divergent one, so that piece belongs to agent-spawning, not heartbeat. Tracked for a follow-up.

Execution plan (for reviewers / next steps)

This PR is the skeleton; landing the full feature needs:

Prompt + registration (this PR)
Default-config entry (this PR)
Tests for registration + config defaults (this PR)
Field test on a real run — verify the prompt produces useful re-classifications, not noise. Best validated on a long-running task config from examples/.
Decide whether challenge_log.md should be schema'd (YAML frontmatter per entry) so the UI can render it; currently free-form markdown.
Consider a coral notes --status disputed filter so disputed notes are easy to find.
Active half: agents.challenger_fraction config knob to spawn a fraction of agents in sharing-disabled "challenger" mode. Separate PR.
Eval whether every=10 plateau threshold is reasonable across task types or needs to be task-specific.

Test plan

uv run pytest tests/test_heartbeat.py tests/test_config.py -v — 33 passed
uv run pytest tests/ — 119 passed
uv run ruff check on touched files — clean
Smoke test on an actual coral start run to confirm the action surfaces in coral heartbeat listing and fires after a plateau.

🤖 Generated with Claude Code

Adds a plateau-triggered, global-scope heartbeat action that audits the notes/skills shared memory for unsupported assumptions, stale claims, and one-off skills that have been promoted to "common knowledge". Scope is intentionally narrow (review only): the action re-classifies shared content (validated/hypothesis/stale/disputed) and appends to challenge_log.md. It does NOT spawn counter-attempts or change the agent's current strategy — that "active" half is left for a follow-up. Skeleton only: - coral/hub/prompts/challenge.md — adversarial-audit prompt - coral/hub/heartbeat.py — register in DEFAULT_PROMPTS / GLOBAL / TRIGGER - coral/config.py — ship in default heartbeat list (every=10, plateau, global) - tests/test_heartbeat.py — registration + default-config tests Refs #67 (memory poisoning / shared-memory drift) Refs #49 (Coyote / mandatory adversarial dissent) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

vercel · 2026-04-25T13:41:00Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
coral	Ready	Preview, Comment	Apr 25, 2026 1:43pm

Drift can accumulate during a healthy upward run, not only when scores plateau — one note becoming consensus on dimension A doesn't show up as a score regression if scores are still climbing on dimension B. A plateau trigger therefore audits too late; interval gives a predictable cadence that catches drift before it locks in. The audit is review-only (no counter-attempts) so running it on a fixed cadence is cheap. - DEFAULT_TRIGGER["challenge"] = "interval" - config default: every=10, is_global=True (no explicit trigger; "interval" is the default) - prompt opening reframed: not plateau-specific - tests updated Refs #67, #49 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

BobbyZhouZijian mentioned this pull request Apr 25, 2026

WIP: heartbeat 'challenge' action — anti-drift audit of shared memory #69

Closed

12 tasks

vercel Bot deployed to Preview April 25, 2026 13:43 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: heartbeat 'challenge' action — anti-drift audit of shared memory#70

WIP: heartbeat 'challenge' action — anti-drift audit of shared memory#70
BobbyZhouZijian wants to merge 2 commits intomainfrom
feat/heartbeat-challenge

BobbyZhouZijian commented Apr 25, 2026

Uh oh!

vercel Bot commented Apr 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

BobbyZhouZijian commented Apr 25, 2026

Status: WIP / draft

Why

What this PR does

Why these defaults

Out of scope (deliberately)

Execution plan (for reviewers / next steps)

Test plan

Uh oh!

vercel Bot commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Apr 25, 2026 •

edited

Loading