Skip to content

WORKBOOK v3.1 — Real-Loop Hardening & Ship (H1–H4)#27

Open
CTlanston wants to merge 1 commit into
mainfrom
workbook/v3.1-real-loop-hardening
Open

WORKBOOK v3.1 — Real-Loop Hardening & Ship (H1–H4)#27
CTlanston wants to merge 1 commit into
mainfrom
workbook/v3.1-real-loop-hardening

Conversation

@CTlanston
Copy link
Copy Markdown
Owner

What

Adds WORKBOOK_v3.1.md — the next-cycle plan for Codex to execute. It is a hardening + ship workbook, not a new-feature one.

Why

An independent, evidence-grounded audit of the P0–P7 commit (a7d400f) found the code is substantially real (6/8 phases solid) — but two verified gaps and one doc-overclaim remain:

  1. P4 — the isolated Gemini hard gate has only ever been mock-tested. The block is real and fails-closed in code, but every committed real-smoke shows Gemini as GEMINI_NOT_CONFIGURED (FAIL) or "not provided/run". A real Gemini has never judged a real diff (no real PASS, no real FAIL).
  2. P7 — the real E2E ran on a /tmp sandbox repo, not a registered repo, and had no real Gemini verdict. planner=claude / coder=codex (real tokens) / worktree diff were real; the validator leg was not.
  3. README overclaims "external Gemini validator … all passed", and its lower 2/3 still describes a deleted Python dual-kernel + Agent-Mesh/Sentinel as "production grade", contradicting docs/PARKED.md.

The plan (H1–H4, each = one commit + one PR, actually pushed)

  • H1 — wire AEDEV_GEMINI_API_KEY into the daemon (validator only; coder/planner stay key-stripped) and make a real Gemini produce one FAIL and one PASS on real diffs, with verdict artifacts under evidence/.
  • H2 — run the full conversational loop on a registered repo (hermus-agent), not /tmp, with a real Gemini verdict.
  • H3 — fix the README overclaim and reconcile the stale Python/Agent-Mesh body with docs/PARKED.md.
  • H4 — small fixes (stale DAG comment, memory into planner, dedup memory noise) + confirm PR-flow closure.

Process rules this enforces (fixing last cycle)

  • G1 every phase is its own commit + PR, verified on origin (last cycle squashed 8 phases into one [P7] commit and never pushed).
  • G2 no "validator passed" claim without a real Gemini verdict artifact in evidence/.
  • G4 PR flow, not direct-to-main (last cycle pushed a7d400f straight to main).

See WORKBOOK_v3.1.md §3 for the full per-phase L1 acceptance criteria.

🤖 Generated with Claude Code

Next-cycle workbook that closes the two gaps the P0-P7 audit verified: a REAL Gemini verdict (P4 was only mock-tested; real Gemini never judged a real diff), a real E2E on a registered repo (P7 ran on a /tmp sandbox), README honesty (remove the overstated 'Gemini validator all passed' + stale Python dual-kernel body), and a hard rule that every phase = one commit + one PR actually pushed to origin.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant