Skip to content

WORKBOOK v4 — 24/7 Autonomous Dev Fleet (F0–F6)#28

Open
CTlanston wants to merge 1 commit into
mainfrom
workbook/v4-autonomous-fleet
Open

WORKBOOK v4 — 24/7 Autonomous Dev Fleet (F0–F6)#28
CTlanston wants to merge 1 commit into
mainfrom
workbook/v4-autonomous-fleet

Conversation

@CTlanston
Copy link
Copy Markdown
Owner

What

Adds WORKBOOK_v4.md — the next-stage execution plan for Codex: take the system from a single-operator, human-in-the-loop, one-mission cockpit to a continuous, concurrent, multi-agent 24/7 autonomous dev team (Anthropic-internal style).

Supersedes #27 / WORKBOOK_v3.1 — v3.1's gap-closure (real Gemini, real E2E) is absorbed as v4 F0. Recommend closing #27.

Why F0 comes first (engineering honesty)

The operator chose to go straight to the fleet. But an independent audit of a7d400f verified two open gaps: the isolated Gemini hard gate has only ever been mock-tested (a real Gemini has never judged a real diff), and the P7 E2E ran on a /tmp sandbox with no real Gemini verdict. An auto-merging 24/7 fleet built on an unexercised merge gate is dangerous, not autonomous — so F0 nails those gaps before any fleet behavior is added.

Stages (F0–F6, each = one commit + one PR, actually pushed; each revived package is ADR-gated)

  • F0 — Real-gate foundation: real Gemini PASS+FAIL on real diffs, real E2E on a registered repo (hermus-agent), and an operator-absent fail-closed merge test. (absorbs v3.1 H1–H2)
  • F1 — Single-lane autonomous loop: wire roadmap-agent → approval → mission queue; one mission runs hands-off; sub-95% clarification escalates to phone + HOLDs that mission without blocking the fleet.
  • F2 — Concurrency (the fleet): revive agent-mesh (fan-out/fan-in) + cli-robust (session pool + quota oracle); ≥3 concurrent missions across repos; quota exhaustion → HOLD, never a paid-API fallback.
  • F3 — Durable orchestration: revive moves/saga; kill -9 mid-mission → resume exactly-once; idempotent side effects.
  • F4 — Risk-gated autonomous merge + phone control: low→auto-merge (flag-gated), medium→phone approval, high→block; revive interrupt-bus.
  • F5 — Fleet observability + unattended safety: simple fleet view (drills into the single-conversation cockpit) + revive sentinel for tool-call interception.
  • F6 — Continuous 24/7 soak + chaos: a real soak window proving missions complete, holds auto-resolve, 0 safety violations, recovery from fault injection.

Standing safety rules (new fleet ground rules GF1–GF5)

Unattended merge is fail-closed (Gemini PASS ∧ risk≤threshold ∧ allow_remote_writesrepo.enabled ∧ no forbidden path); reviving any parked package requires an ADR; concurrency/budget caps with quota-HOLD (never paid-API fallback); idempotent + resumable (exactly-once merge); gates route to phone, no self-approve, one mission's HOLD never blocks the fleet. Inherits all WORKBOOK_v3/v3.1 ground rules.

See WORKBOOK_v4.md §3 for full per-stage L1 acceptance.

🤖 Generated with Claude Code

Next-stage workbook (operator chose to go straight to the 24/7 fleet). Supersedes WORKBOOK_v3.1/PR #27 — v3.1's gap-closure becomes v4 F0. Pushes the system from a single-operator, human-in-the-loop, one-mission cockpit to a continuous, concurrent, multi-agent autonomous dev team. F0 first nails the two audit-verified gaps (real Gemini verdict, real E2E) because an auto-merging fleet on an unexercised Gemini gate is dangerous, not autonomous. F1-F6 then revive the parked packages (agent-mesh/cli-robust/moves/interrupt-bus/sentinel/chaos), each ADR-gated, to add autonomous intake, concurrency, crash recovery, risk-gated auto-merge, fleet observability, and a continuous soak.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant