WIP: heartbeat 'challenge' action — anti-drift audit of shared memory#70
Draft
BobbyZhouZijian wants to merge 2 commits intomainfrom
Draft
WIP: heartbeat 'challenge' action — anti-drift audit of shared memory#70BobbyZhouZijian wants to merge 2 commits intomainfrom
BobbyZhouZijian wants to merge 2 commits intomainfrom
Conversation
Adds a plateau-triggered, global-scope heartbeat action that audits the notes/skills shared memory for unsupported assumptions, stale claims, and one-off skills that have been promoted to "common knowledge". Scope is intentionally narrow (review only): the action re-classifies shared content (validated/hypothesis/stale/disputed) and appends to challenge_log.md. It does NOT spawn counter-attempts or change the agent's current strategy — that "active" half is left for a follow-up. Skeleton only: - coral/hub/prompts/challenge.md — adversarial-audit prompt - coral/hub/heartbeat.py — register in DEFAULT_PROMPTS / GLOBAL / TRIGGER - coral/config.py — ship in default heartbeat list (every=10, plateau, global) - tests/test_heartbeat.py — registration + default-config tests Refs #67 (memory poisoning / shared-memory drift) Refs #49 (Coyote / mandatory adversarial dissent) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
12 tasks
Drift can accumulate during a healthy upward run, not only when scores plateau — one note becoming consensus on dimension A doesn't show up as a score regression if scores are still climbing on dimension B. A plateau trigger therefore audits too late; interval gives a predictable cadence that catches drift before it locks in. The audit is review-only (no counter-attempts) so running it on a fixed cadence is cheap. - DEFAULT_TRIGGER["challenge"] = "interval" - config default: every=10, is_global=True (no explicit trigger; "interval" is the default) - prompt opening reframed: not plateau-specific - tests updated Refs #67, #49 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Status: WIP / draft
Skeleton only. Prompt + registration + tests. No active counter-attempts — review-half only, per maintainer scoping decision.
Refs #67 (shared-memory drift / memory poisoning)
Refs #49 (Coyote / mandatory adversarial dissent)
Why
#67 raises that shared notes/skills can quietly homogenize agents over long runs (one agent's hallucination becomes group doctrine). #49 proposes a heavier governance stack but the one well-grounded primitive in it — a mandatory adversarial pass — fits CORAL's existing heartbeat machinery without buying the rest of that framework.
This PR adds that primitive and stops there.
What this PR does
New prompt template
coral/hub/prompts/challenge.mdinstructing an agent to:status: validated | hypothesis | stale | disputedso future agents see what was once believed.challenge_log.mdsummarizing the audit.Register
challengeincoral/hub/heartbeat.py:DEFAULT_GLOBAL["challenge"] = True— population-level concern, one pass per run is enough.DEFAULT_TRIGGER["challenge"] = "plateau"— drift matters when scores stop improving, not on a fixed clock.Add to default heartbeat list in
coral/config.py:every=10, is_global=True, trigger="plateau".Tests asserting registration and default-config wiring.
Why these defaults
lint_wiki—lint_wikidoes janitorial work (merge duplicates, fix orphans).challengequestions whether the surviving content is true. The prompt explicitly calls this out.Out of scope (deliberately)
The "active" half of the anti-drift design — running attempts that deliberately violate consensus (a
agents.challenger_fractionknob, or a sharing-disabled agent slot) — is not in this PR. Heartbeats interrupt an existing agent rather than spawn a divergent one, so that piece belongs to agent-spawning, not heartbeat. Tracked for a follow-up.Execution plan (for reviewers / next steps)
This PR is the skeleton; landing the full feature needs:
examples/.challenge_log.mdshould be schema'd (YAML frontmatter per entry) so the UI can render it; currently free-form markdown.coral notes --status disputedfilter so disputed notes are easy to find.agents.challenger_fractionconfig knob to spawn a fraction of agents in sharing-disabled "challenger" mode. Separate PR.every=10plateau threshold is reasonable across task types or needs to be task-specific.Test plan
uv run pytest tests/test_heartbeat.py tests/test_config.py -v— 33 passeduv run pytest tests/— 119 passeduv run ruff checkon touched files — cleancoral startrun to confirm the action surfaces incoral heartbeatlisting and fires after a plateau.🤖 Generated with Claude Code