feat(repo): housekeeper subagent + Phase E orchestrator wiring#33
Conversation
Adds the 7th plan-execution subagent role per spec §5.2: blue color, tools Read/Grep/Glob/Edit/Write (Bash omitted to mechanically enforce Plan Invariant I-3 — no git shell-out from this role). Body sections mirror plan-execution-contract-author.md shape: Inputs, Mindset, Hard rules, Decision presentation, Exit states (only the 4 canonical from failure-modes.md per Plan Invariant I-2), Report format, Reference files. Refs: Plan-housekeeper Phase 4 Task 4.1; Spec §5.2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The contract is the housekeeper subagent's portable reference; sections extract spec §5.3 manifest schema, §5.1 exit codes, §5.3 validation invariants, §7.2 resume diagnostic, §3a.2 completion-rule matrix, §3a.4 file-reference extraction heuristic. Adds the canonical subagent prompt template that buildHousekeeperPrompt (Task 4.3) reproduces verbatim and the Layer 2 snapshot test (Task 4.8) pins. Refs: Plan-housekeeper Phase 4 Task 4.2; Spec §9.1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Canonical Subagent Prompt Template (Task 4.2 verbatim source + deployed contract) cited `rule 21` for the sprawl-routing case, but Task 4.7 defines rule 20 = sprawl (round-trip dispatch) and rule 21 = schema-violation halt → BLOCKED. Also re-frames the routing prose: sprawl triggers an orchestrator round-trip (DONE_WITH_CONCERNS is a secondary outcome only after weak-justification re-dispatch, not the primary route). Without this fix, Task 4.8's snapshot test would pin the wrong rule number into CI fixtures. Fixes both the canonical contract and the plan's verbatim source so a re-run of Task 4.2 cannot reintroduce the bug. Refs: Plan-housekeeper Phase 4 Task 4.2; Task 4.7 rule 20. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds .claude/skills/plan-execution/lib/housekeeper-orchestrator-helpers.mjs exporting buildHousekeeperPrompt (BASE template from contract + conditional AUTO-CREATE / schema-violation augmentations) and validateManifestSubagentStage (catches: pending-item gaps, file-residual placeholders, semantic_edits values P2 placeholders, nested-array placeholders, schema_violations x concerns reconciliation gaps). 8 unit tests TDD-authored. Refs: Plan-housekeeper Phase 4 Task 4.3; Spec §5.3, §8.2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code-quality review POLISH items: lift `<TODO subagent prose>` literal to module scope (single source of truth across walkForPlaceholder + validateManifestSubagentStage), and reap Test 6's tmp dir via try/finally + rmSync to match sibling test convention. Refs: Plan-execution-housekeeper Phase 4 Task 4.3
Phase E now describes the 8-step post-merge housekeeping flow: candidate-lookup over §6 NS catalog, script dispatch (--candidate-ns / --auto-create / NEEDS_CONTEXT on multi-match), script-stage manifest validation, plan-execution-housekeeper subagent dispatch, subagent-stage validation, Progress Log append (moved from before-merge to after-housekeeping per spec §6.1), single bundled commit, push. Per Plan §Decisions-Locked D-3 the housekeeper ticks ALL Done-Checklist boxes for the merged Phase. Footnote at step 1 cross-links to Plan §D-7 fixture matrix. Refs: spec §5.4, §6.1, §6.2, §7.1; Plan-execution-housekeeper Phase 4 Task 4.4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Plan line 2923 prescribes "A new bullet at the top of Phase E noting ..." but the Task 4.4 rewrite rendered the D-3 line as a paragraph. Prepend the list marker so the rendering matches the plan's "new bullet" framing. Refs: Plan-execution-housekeeper Phase 4 Task 4.4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
POLISH from Task 4.4 code-quality review: - Step 6 path drifted to gitignored `.agents/tmp/<session-id>/progress.md`, which is incompatible with step 7's "single git commit on develop" (spec §6.1 line 905 mandates same-commit). Restore plan-body `docs/plans/NNN-*.md` § Progress Log location. - GFM footnote ref `Rules:[^d7]` needs space before `[^d7]`. Mirrored fix in plan source per Task 4.2 rule-20 precedent (commit b5f1e10) so re-runs don't reintroduce the path drift. Refs: Plan-execution-housekeeper Phase 4 Task 4.4
Task 4.5 of plan-execution-housekeeper Phase 4: bumps the orchestrator's role count six → seven, adds `plan-execution-housekeeper` to all three role enumerations (line 29 introduction, line 42 no-git rule, line 515 Reference Files subagent-definitions row), and adds a Reference Files row pointing to `references/post-merge-housekeeper-contract.md` (authored in Task 4.2, commit b5f1e10's predecessor). Set-quantifier sweep includes lines 42 and 515 alongside the explicit line 29 from the plan task spec — per same-class-sweep discipline, the "Six subagent roles" claim mirrors at all three sites. Refs: Plan-execution-housekeeper Phase 4 Task 4.5
state-recovery.md: append "Resuming a Phase E housekeeping halt" diagnostic (per spec §7.2) and mirror the four §4.3.2 candidate-lookup rules inline (per spec §4.3.5) so a recovering engineer doesn't need to cross-link to SKILL.md to debug rule-N mismatches. Flag the existing line-174 .agents/tmp/ lifecycle drift with a \`> Note:\` block linking to BL-109. backlog.md: reserve BL-109 to track reconciling state-recovery.md's ".agents/ tmp/ deleted at commit time" claim with lefthook.yml (which has no such job). Refs: spec §6.1, §7.2, §4.3.5, §9.3; Plan §Decisions-Locked D-5. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
POLISH from Task 4.6 code-quality review: an inbound anchor in state-recovery.md (the > Note: callout authored alongside BL-109) points to the GitHub-slugified BL-109 heading. The heading contains backticks + dots + slashes that get stripped by GFM, so any future heading reword could silently break the inbound anchor without any hook catching it. Add an HTML comment above the heading documenting the dependency, mirroring the PR #24 / PR #27 canonicalization-sweep lessons. Refs: Plan-execution-housekeeper Phase 4 Task 4.6
Adds rules 20 (housekeeper subagent affected_files sprawl detection + round-trip resolution) and 21 (script schema-violation BLOCKED routing reusing rule-4's graceful-drain) to failure-modes.md. Both reuse the existing four canonical exit-states (DONE / DONE_WITH_CONCERNS / NEEDS_CONTEXT / BLOCKED) per spec §7.1 invariant; no new exit-state introduced. Refs: spec §7.1; Plan-execution-housekeeper Phase 4 Task 4.7
POLISH from Task 4.7 review: 4 citation sites used "§ Exit Codes" (capital C) while the contract heading at references/post-merge-housekeeper-contract.md:70 is "## Exit codes" (lowercase). Sweep direction lowercases the citers (not capitalizes the heading), because sentence-case matches: (a) all 5 sibling ## headings in post-merge-housekeeper-contract.md (b) sibling references/preflight-contract.md:13 "## Exit codes" Sites swept (4): - references/failure-modes.md:206 (rule 21, just landed in bc9c03b) - references/state-recovery.md:198 (Task 4.6 commit 5f0d04b) - plan source line 2989 (state-recovery.md verbatim mirror) - plan source line 3120 (failure-modes.md verbatim mirror) Refs: Plan-execution-housekeeper Phase 4 Task 4.7
Adds 7 unit tests to post-merge-housekeeper-orchestrator-helpers.test.mjs covering Plan §Decisions-Locked D-7 rows 12-15 (prompt-template snapshot, zod schema parse, affected_files superset, sprawl detection routing) and the I-1/I-2/I-3 invariants (cite-target hook regression, canonical exit- state declaration, no-shell-out for git). Adds new helper detectAffectedFilesSprawl to housekeeper-orchestrator- helpers.mjs (pure function, set-difference) and extends validateManifestSubagentStage with an optional scriptAffectedFiles param that enforces the D-7 row 14 superset check. Adds zod ^4.3.6 to root devDependencies so the schema test resolves the import. Adds the RESULT: prefix to the four canonical exit-states in plan-execution-housekeeper.md so the I-2 invariant test detects all four declarations consistently. Refs: Plan-execution-housekeeper Phase 4 Task 4.8
Plan §Task-4.9 Step 1 prescribed `node --test <directory>/` but Node 22.21 does NOT directory-walk explicit path args — only auto-discovers from cwd. The directory-arg was treated as a module path and threw MODULE_NOT_FOUND. Use `<directory>/*.test.mjs` glob instead. Also corrects the expected-test-count line: Task 4.8 added 7 tests (4 D-7 rows + 3 invariants), not 4 D-7 rows alone. Refs: Plan-execution-housekeeper Phase 4 Task 4.9
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fa932d4567
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
The validator's reconciliation loop used `.some(c => c.kind === "schema_violation")` which ignored `sv` and let a single generic concerns entry satisfy N violations. Fix: filter concerns by kind once, then match each schema_violations entry structurally on `field` (and `ns_id` when present). Update the gap-message to reference the actual differentiator (e.g. "NS-15.type") instead of the always-literal `sv.kind`. Adds three regression tests: - 2 violations + 1 generic concern → exactly the unmatched 'type' violation gaps - 2 violations + 2 specific concerns → passes - --candidate-ns shape (no ns_id) → matches by `field` alone Refs: PR #33 Codex review thread r3198337816 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 027ef550a2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Codex P1 (PR #33 R2): two gaps in the subagent-stage validator forced false-positive round-trips on legitimate halt outcomes. Gap 1 — `addressing: <item-key>` undocumented. The canonical-template responsibility list never told the subagent that concerns deferring a `semantic_work_pending` item must set `addressing: "<exact-item-key>"` to satisfy the validator's per-item check. The match key was a hidden contract. Gap 2 — per-item check fired on halt-states. When subagent returned BLOCKED or NEEDS_CONTEXT (it stopped before completing semantic work), the per-item check still demanded coverage of every pending item, forcing wasted dispatches. Fix: - Contract doc canonical template adds responsibility #5 documenting the `addressing: <item-key>` requirement (renumbers existing #5/#6/#7 -> #6/#7/#8). Validation invariants line clarifies the matching key + halt-state waiver. - JS CANONICAL_TEMPLATE const re-synced verbatim per Plan §Decisions-Locked D-1; D-7 row 12 snapshot test confirms sync. - validateManifestSubagentStage skips the per-item loop when manifest.result is "BLOCKED" or "NEEDS_CONTEXT". DONE_WITH_CONCERNS still requires per-item pairing — waiver narrowly scoped to halts. - Gap message now hints at the contract requirement so future regressions self-document. Tests: 4 new regression tests (BLOCKED waiver, NEEDS_CONTEXT waiver, DONE_WITH_CONCERNS still requires pairing, `addressing` match by exact item-key with `kind` irrelevant). Note: Codex Finding 2 (missing merge-command steps in SKILL.md Phase E) is a deeper architectural drift between spec (pre-merge housekeeping) and plan/SKILL.md (post-merge ordering) — deferred to user direction; not addressed here. Refs: PR #33 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6208124855
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Codex P2 (PR #33 R3): the schema_violations × concerns matcher's predicate `c.field !== sv.field` was trivially false when both sides lacked a `field` (undefined !== undefined), letting a single generic concern absorb multiple distinct-shape violations. Plus the pre-filter `c.kind === "schema_violation"` excluded shapes the script actually emits — `auto_create_title_seed_underivable` (line 899) is a singleton with no `field`/`ns_id`, so it would never find a matching concern. Fix: - Validator drops the kind="schema_violation" pre-filter and adds `c.kind === sv.kind` to the per-violation match predicate. Each violation must now be paired to a concern of the SAME kind. - Gap message handles fieldless violations by falling back to a `(kind: <kind>)` label (was printing literal "undefined") and reflects the actual match-key requirements. - Contract doc (validation invariants line 93 + canonical template responsibility #4) updated to describe the actual three-shape rule (schema_violation OR auto_create_title_seed_underivable singleton) rather than hardcoding `kind: "schema_violation"`. - JS CANONICAL_TEMPLATE const re-synced verbatim per Plan §Decisions-Locked D-1; D-7 row 12 snapshot test confirms sync. - Validator JSDoc check #4 expanded to call out the kind discriminator and the trivial-equality vulnerability it closes. Tests: 3 new regression tests (distinct-kind violations require kind-discriminated concerns, singleton kind matches by kind alone, fieldless-violation gap message format). Refs: PR #33 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 77f44354ad
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Codex P1 (PR #33 R3 Finding 4): .claude/agents/plan-execution-housekeeper.md line 41 still mandated `kind: schema_violation` literally even after commit 77f4435 changed the validator to discriminate by sv.kind verbatim. The Finding 3 fix updated the contract doc + JS const + validator, but missed the same-axis prose in 4 other locations across the doc tree — classic canonicalization-sweep miss per `feedback_canonicalization_sweep_completeness.md`. Sweep covers all parallel hard-rule prose + invariants: - .claude/agents/plan-execution-housekeeper.md:41 (Codex flagged direct) - docs/superpowers/specs/.../housekeeper-design.md:617 (spec hard rule) - docs/superpowers/specs/.../housekeeper-design.md:693 (validation invariants) — also brings the invariants prose into alignment with Finding 1's `addressing: <item-key>` + halt-state waiver wording (was stale from the prior PR cycle's runtime fix) - docs/superpowers/specs/.../housekeeper-design.md:953 (failure-mode table) - docs/superpowers/plans/.../housekeeper-implementation.md:2477 (plan hard rule) Skipped (out of canonicalization-sweep scope): - Plan source's prescribed canonical template (lines 2570-2595) - Plan source's prescribed validator code (lines 2785-2839) These are historical implementation guides describing what the implementer was supposed to write at PR-construction time. Subsequent fixes (Findings 1 and 3) evolved the implementation; updating prescription post-hoc is plan-archeology, not contract-alignment. Note: Codex Finding 5 (PRRT_kwDOSCycWc6ALNW_) re-raises the architectural drift originally flagged as Finding 2 (PRRT_kwDOSCycWc6AK48r) — both about missing pre-merge / merge-step commands in SKILL.md Phase E. That question is OUT OF SCOPE for this commit; still awaits user direction on Path A (restructure SKILL.md + plan Task 4.4 per spec §4.1/§6.2 ordering) vs Path B (add D-8 Decision Log + reconcile line-45 contradiction). Refs: PR #33 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dac2396634
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Validator now enforces two contract invariants from
references/post-merge-housekeeper-contract.md line 93 that prior
rounds missed:
- result !== null + ∈ {DONE, DONE_WITH_CONCERNS, NEEDS_CONTEXT, BLOCKED}:
blocks any non-canonical exit state (was: undefined slipped through).
Closes Codex finding PRRT_kwDOSCycWc6ALVFW (P2).
- schema_violations.length > 0 ⇒ result === "BLOCKED":
surfacing alone is insufficient; the contract requires the BLOCKED
exit state for halt/routing-path determinism.
Closes Codex finding PRRT_kwDOSCycWc6ALVFN (P1).
Tests: 25 → 30 (added 5 cases — 3 canonical-state, 2 BLOCKED-on-violations).
Existing fixtures gained explicit `result` fields so they assert under
the new enforcement intentionally rather than passing by accident.
JSDoc on validateManifestSubagentStage now enumerates 7 numbered checks
in code order; the new canonical-state check is #1 and the new
BLOCKED-when-schema-violations check is #6.
D-7 row 12 snapshot test (canonical template ↔ JS const verbatim sync)
unchanged and continues to pin contract↔JS sync.
Refs: PR #33
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8deaf8155e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Validator's pending-item reconciliation used hasOwnProperty.call() which only checks key presence. A subagent returning semantic_edits.compose_status_prose = undefined would pass the check despite zero payload, letting DONE/DONE_WITH_CONCERNS slip through with unresolved semantic work and skipping a needed round-trip. Tightened to require a meaningful payload: not null/undefined, not empty string (after trim), not empty array, not empty object. Numbers/booleans accepted defensively for future contract evolution. Closes Codex finding PRRT_kwDOSCycWc6ALgop (P2). Tests: 30 → 35 (added 4 empty-variant cases + 1 negative control). Helper isMeaningfulPayload extracted to module scope alongside walkForPlaceholder for testability and single-source-of-truth. JSDoc check #2 now clarifies the value-must-be-meaningful rule and cites the contract clause "containing the composed output" from canonical-template responsibility #5. Refs: PR #33 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8bae2f15d0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Validator's placeholder-scan loop silently skipped any affected_files
entry that no longer existed on disk (existsSync gate). A subagent that
deleted a declared file (e.g. cross-plan-dependencies.md) would pass
validation despite the contract clause that affected_files are files
the subagent EDITED — implying they exist post-edit.
Loop now pushes a gap when existsSync is false ("declared in
affected_files but missing from disk — destructive out-of-scope
behavior") AND continues (skipping the placeholder read for the
missing file). Existing-file branch unchanged.
Closes Codex finding PRRT_kwDOSCycWc6ALnyo (P1).
Tests: 35 → 36 (added missing-file gap; negative case covered by
existing placeholder-scan fixtures).
Contract doc clause at line 93 augmented with explicit parenthetical:
deletion of a declared file is a contract violation surfaced by the
validator's missing-file gap.
JSDoc check #3 now mentions both placeholder-scan AND existence-check
since both live in the same loop.
Refs: PR #33
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7bec222acc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Codex F-AMU4D (PR #33 P2): the placeholder-scan loop called readFileSync unconditionally after the existsSync gate, throwing EISDIR on directory paths and any other I/O error, crashing the orchestrator instead of routing through the validator's gap collection. Add statSync().isFile() guard + try/catch around statSync and readFileSync; each failure path emits a distinct gap citing F-AMU4D so the failure mode is identifiable. The contract requires affected_files entries to be regular files (the script edits line-level content); a directory is a contract violation that MUST surface as a gap so Phase E can re-dispatch — not an unhandled exception that halts orchestration. Regression test: a directory at the affected_files path now emits a gap matching "is not a regular file" + "Codex F-AMU4D" rather than crashing with EISDIR. Tests: 48 helpers (47 + 1 new) + 116 main pass.
Codex F-AMWXK (PR #33 P1): validateManifestSubagentStage iterated manifest.semantic_work_pending (subagent-written), so a subagent could clear that array + return DONE/DONE_WITH_CONCERNS — the per-item pairing loop ran zero times and emitted zero gaps, letting unaddressed semantic work pass validation. SAME bypass shape as F-AMP7Y (schema_violations) + F-AMSEL (verification_failures). STRUCTURAL DIVERGENCE from checks #9/#10: those guard array-non-emptiness → BLOCKED state with a separate preservation check. F-AMWXK leverages the EXISTING per-item iteration by feeding it the UNION of script-stage snapshot + subagent-written array. The script snapshot is the immutable contract (subagent CANNOT bypass by clearing); the subagent's additions are also iterated (subagent committed to address anything they added). Why union not preservation-check: a separate preservation check would emit "X missing from semantic_work_pending" — forcing 3 round-trips (R1 array-shrink gap → R2 re-add → R3 address). Union semantics keep the existing "X listed but unaddressed" gap actionable in ONE round-trip. Add scriptSemanticWorkPending parameter (defaults null for legacy callsites + tests). Updated JSDoc check #11 documenting the new parameter, the union semantics, and the structural divergence. Regression tests (3 — pin both directions of the boundary plus the union semantics): - cleared semantic_work_pending → 2 gaps for both script-stage items - preserved + addressed → valid - subagent-added items also iterated → 1 gap for unaddressed addition Tests: 51 helpers (48 + 3 new) + 116 main pass.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7bec222acc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Path-(a) A2 design closes the system-level half of the contract-half-fix bypass class (F-AMP7Y / F-AMSEL / F-AMWXK). Validator-API parameters were in place but no production caller plumbed them, so the four preservation / iteration checks (#7 / #9 / #10 / #11) silently disabled. Script now writes _script_stage as an immutable snapshot of the four arrays the validator enforces (affected_files / schema_violations / verification_failures / semantic_work_pending). Validator reads the snapshot from the manifest with precedence: explicit scriptXXX param > manifest._script_stage[field] > null. Structural-tampering check fires whenever the snapshot path is relied on (any scriptXXX null), catching secondary attacks where the subagent removes _script_stage to defeat the snapshot defense. Subagent contract: _script_stage is READ-ONLY. Touching it surfaces as a Codex F-AMbIV gap. Documented in the canonical subagent prompt template, the agent definition file, the contract doc § _script_stage immutability, and SKILL.md Phase E step 5. Tests: 4 new F-AMbIV regressions (A2 fallback / missing-snapshot gap / malformed-snapshot gap / param precedence). All 171 housekeeper tests pass (55 helper + 116 main). 14 fixture snapshots updated to mirror the new emission. 42 existing test fixtures bulk-updated with empty _script_stage to satisfy the structural check on legacy code paths. Refs: PR #33 Codex finding F-AMbIV
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7bec222acc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
phase e step 4 dispatched the housekeeper subagent unconditionally, but the contract classifies exits 1/4/>=6 as orchestrator-stage halts (operator action required, not subagent work). dispatching over a malformed/absent manifest forced the llm to interpret garbage and emit a result tag based on hallucinated state. encode the dispatch/halt mapping in a tested helper (decideHousekeeperRouting) so skill prose, contract, and any future audit script delegate to one source of truth, mirroring the f-amp7y/f-amsel/ f-amwxk/f-ambiv pattern (move enforcement out of prose, into validators). 10 unit tests pin the mapping per exit class (0/1/2/3/4/5/6/137/-1/nan). Refs: PR #33, F-ANGCa Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
skill prose, agent rules, contract reference, and the canonical subagent prompt template all carried `Codex F-AMbIV` / `Codex F-ANGCa` / `Codex P1 PR #33` review-thread suffix labels. these are github graphql ids with no semantic content for any reader outside the originating pr review tab — they violate the global rule against referencing the current task / fix / caller in code (claude.md), and they bleed review-process metadata into the housekeeper subagent's prompt every dispatch. the rule's actual rationale already lives in the surrounding prose; the f-tag added zero behavioral signal while consuming prompt tokens. scope is bounded to runtime-loaded surfaces (skill prose, agent rule list, contract reference, canonical_template constant). validator gap messages and jsdoc annotations stay — they're operator-facing / engineer-facing, not loaded into the housekeeper subagent's prompt context. the breadcrumb trail to source threads remains in commit messages and test names. Refs: PR #33 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 902314e2be
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Validator's affected_files loop joined declared paths against repoRoot without first checking they were repo-relative. Malformed manifests emitting an absolute path (/etc/passwd) or parent-traversal (../../private) bypassed validation; later step-7 git add fails with "outside repository pathspec", dead-ending housekeeping after a valid:true signal. Encoded containment in assertRepoRelative (lexical isAbsolute + relative startsWith ".."), called BEFORE existsSync. 7 unit tests cover helper branches + validator integration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6966fd9ef3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Validator's affected_files loop forwarded each entry to assertRepoRelative (isAbsolute), which throws TypeError on non-string input. The schema_violations reconciliation + preservation loops both dereferenced v.kind/field/ns_id without per-element shape check, so a tampered manifest with null or non-object entries crashed mid-validation instead of surfacing structural-tampering as a gap. Sanitize once at the top (cleanedSchemaViolations) and add typeof guard before assertRepoRelative (idx-keyed gap message); downstream loops iterate the cleaned arrays. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d69e5f7522
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Closes the (manifest.X ?? []).method() bug class across all 6 callsites in the validator. Two sub-classes addressed: (1) container-type — array fields received an OBJECT/scalar instead of an array, crashing .entries()/.some()/.map() with "is not a function"; (2) element-shape — arrays of objects received null/scalar elements, crashing field dereference (.kind/.field/.addressing) with TypeError. Pre-fix sites: schema_violations (recon + preserve), concerns (pending-pair + recon), affected_files (scan + D-7 superset), verification_failures (preserve). Centralized in sanitizeObjectArrayField + sanitizeStringArrayField helpers; consumers iterate cleaned arrays. Same gap-routing pattern as F-AMU4D / F-Axg-z: contract violations surface as gaps with contract-anchored shape hints, not orchestrator crashes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 48f21daac1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Removed two classes of ephemeral PR-thread identifiers leaked into runtime
test/code/comment surfaces under .claude/skills/plan-execution/:
- F-tag suffixes (F-AMSEL/F-AMU4D/F-AMP7Y/F-AMNGCa/F-AMbIV/F-AMWXK/
F-Axg-w/F-Axg-z/F-AxXBH/F-AxkNI/F-AxkNJ/F-AMLor/F-ANGCa)
- Full GraphQL thread IDs (PRRT_kwDO...) and review permalink IDs
(Codex thread rNNNNNNNN — three call-sites)
Both classes describe Codex review-thread provenance: base64 GraphQL IDs
that are ephemeral (rot when the PR is closed/squashed) and opaque (no
resolution path for a future reader). Per the standing rule, these
identifiers belong in commit messages and PR thread replies, never in
runtime-loaded test/code/comment/gap-message surfaces.
Replaced identity-based test assertions (/Codex F-XXX/.test(g)) with
behavior-based assertions (e.g., g.includes("script-stage
verification_failure")) so future refactors that rename gap-message text
don't silently break test coverage on a stale identifier match.
Known residual: Bug-N markers (Bug-4/5/8/9) remain inline in the same
files. They're a separate internal-triage convention without external
docs/backlog cross-references; stripping them was out of scope for this
authorization and is deferred as follow-up.
Test count unchanged: 254/254 pass.
Refs: PR #33
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2715e011c2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Codex P1 finding on housekeeper-orchestrator-helpers.mjs:349 — the `[...effScript, ...(manifest.semantic_work_pending ?? [])]` union spread crashes "object is not iterable" when a malformed subagent manifest sets `semantic_work_pending` to a non-array. The `??` only coalesces null/ undefined; an object/scalar short-circuits the coalesce, returns the bad value, and the spread throws. Bypasses the intended re-dispatch path — bad manifest structure becomes orchestrator failure instead of a recoverable validation gap. Same bug class as the prior class-level fix (sanitize helpers for schema_violations / concerns / verification_failures / affected_files), missed at the spread surface for semantic_work_pending. The earlier fix scope only covered method-call sites (`(arr ?? []).method()`); this commit extends to the spread surface. Fix: add cleanedSemanticWorkPending = sanitizeStringArrayField( manifest.semantic_work_pending, ...) upstream alongside the other sanitize calls; route both consumption sites through the cleaned array. Inline asymmetry note: ONLY the manifest field is sanitized; effScriptSemanticWorkPending is upstream-resolved via Array.isArray (script-stage = trusted, manifest = subagent-output = untrusted). Test added (integration test mirrors the AxkNJ proof pattern): manifest with _script_stage.semantic_work_pending populated and tampered manifest.semantic_work_pending produces BOTH a structural-tampering gap AND the canonical "X listed but absent" gap on the script-stage pending item — proves a tampered live field doesn't mask preservation enforcement. Test count: 255/255 pass (was 254; +1 new test for this finding). Refs: PR #33 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f20bdc249e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Sanitize elements of `effScriptVerificationFailures` and `effScriptSchemaViolations` BEFORE the preservation iteration loops via the existing `sanitizeObjectArrayField` helper. The container-type guard in check #12 only verifies the field is an array — element-shape was unguarded, so a tampered `manifest._script_stage.verification_failures` containing `[null]` crashed `Object.keys(null)` inside the `failureKey` closure (and `manifest._script_stage.schema_violations` containing `[null]` TypeError'd on `sv.kind` dereference). Same fix shape as the manifest-side sanitize already in place for the four object-array fields. The companion finding (manifest `_script_stage` as immutable baseline) is deferred — see thread reply for architectural rationale. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f1ce845260
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Same class as the F-XXXX review-thread suffix cleanup (commit 2715e01). Bug-N labels were ephemeral session-tracking identifiers used during the initial finding cycle for this PR — they have no meaning outside the originating review session and rot the moment the PR squashes. Replaced each occurrence with a description of the bug class or contract violation the test/branch defends; preserved all surrounding rationale. Touches: - post-merge-housekeeper.test.mjs (Bug-9, Bug-8, Bug-4 ×2, Bug-5 headers + 1 test name) - post-merge-housekeeper.mjs (Bug-4 ×2 inline pre-validation comments) 256/256 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When validateManifestSubagentStage runs without all four scriptXXX params and falls back to manifest._script_stage, the validator now emits an explicit gap. _script_stage is subagent-emitted and may be tampered; the orchestrator's stage-1 conversation memory snapshot (read in SKILL.md step 3 BEFORE subagent dispatch) is the only untamperable baseline. Falling back silently allowed preservation bypass when the subagent cleared both the top-level emit field and the snapshot field while keeping shape intact. SKILL.md step 5, the housekeeper contract, the agent prompt, and the embedded CANONICAL_TEMPLATE all updated to describe the orchestrator-plumbs-stage-1-snapshot trust model. _script_stage is demoted from "immutable" to "redundant integrity signal" — the primary defense is the validator's gap, the snapshot is defense- in-depth. Test fixtures updated to plumb scriptXXX explicitly (mirroring the production path); two new tests cover the baseline-trust gap positive (omit scriptXXX → gap fires) and negative (all four plumbed → suppressed) cases. Refs: PR #33 finding F-AyDCN Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 682b3c4ecc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Multi-candidate runs (--candidate-ns NS-XX,NS-YY) emit a divergent
manifest shape: matched_entry is null with matched_entries[] carrying
per-NS metadata, and mechanical_edits.{status_flip,mermaid_class_swap}
are absent (omitted entirely) with plural status_flips[] /
mermaid_class_swaps[] arrays in their place. Canonical fixture is
scripts/__tests__/fixtures/14-multi-candidate-happy-path.
The contract previously described only the single-candidate shape, and
the canonical subagent prompt template's responsibility #1 told the
housekeeper to replace `<TODO subagent prose>` placeholders only in
mechanical_edits.status_flip.to_line — a subagent following the prompt
literally on a multi-candidate run would skip the status_flips[]
entries. The validator's on-disk placeholder scan would have caught
the omission on the next round-trip (the safety net is shape-agnostic),
but the wasted round-trip was avoidable.
Updated responsibility #1 in both helpers.mjs CANONICAL_TEMPLATE and
the contract's §Canonical Subagent Prompt Template (kept verbatim in
sync for the buildHousekeeperPrompt parity test) to cover both shapes,
added a multi-candidate-shape paragraph to the Manifest schema section,
and updated the validation-invariants line to acknowledge the plural
keys.
Refs: PR #33 finding F-A2u4r
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: df5a798f21
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
A malformed subagent output (JSON literal `null`, undefined, scalar, or array root) would crash on the first `manifest.X` dereference at helpers.mjs L214 before gap collection ran, throwing TypeError instead of returning a structured validation gap. That broke Phase E's contract-violation recovery path — the orchestrator caught a runtime crash instead of a routable gap, and could not re-dispatch / surface the contract violation to the user. Added an entry guard at function start that catches null, undefined, non-object scalars, and array roots. Discriminates on actualKind for actionable messaging and short-circuits with a structured gap so downstream code can trust manifest is a non-null non-array object. Four dedicated tests cover null / undefined / scalar (number+string) / array roots. 89/89 pass. Refs: PR #33 finding F-A24wj Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`902314e` swept review-thread suffixes from runtime-loaded surfaces (skill
prose, agent rules, contract reference, canonical prompt template) but
missed a single `F-A24wj:` prefix in a Layer-2 unit-test comment. The tag
is a Codex GraphQL ID with no semantic content for any reader outside the
originating PR review tab; the surrounding prose already names the bug
class (manifest = null short-circuit gap) without the tag.
Detection: `grep -rE 'F-A[A-Za-z0-9]{4,}|Codex F-' .claude/ packages/ scripts/`
now returns zero hits across runtime-loaded surfaces.
Comment is a pure edit — no test logic touched; 108-test file + full
386-test plan-execution skill suite pass locally.
Refs: PR #33
Summary
plan-execution-housekeeper(color: blue; tools: Read, Grep, Glob, Edit, Write).references/post-merge-housekeeper-contract.mdwith manifest schema, exit codes, validation invariants, recovery diagnostic, completion-rule matrix, file-reference extraction heuristic.state-recovery.md: Phase E housekeeping recovery section (per spec §7.2) + inline candidate-lookup rules (per spec §4.3.5) + flag for the line-174.agents/tmp/lifecycle drift (deferred reconciliation per spec §9.3, tracked as BL-109).failure-modes.md: routing rules 20 + 21 added — rule 20 coversaffected_filessprawl round-trip, rule 21 covers schema-violation BLOCKED routing reusing rule 4 (no new exit-state introduced per spec §7.1 invariant).detectAffectedFilesSprawlinhousekeeper-orchestrator-helpers.mjs(pure set-difference function for routing rule 20).Refs
docs/superpowers/specs/2026-05-03-plan-execution-housekeeper-design.md§5.2, §5.3, §5.4, §6.1, §6.2, §7.1, §7.2, §7.3, §9.1, §9.3, §10.1docs/superpowers/plans/2026-05-03-plan-execution-housekeeper-implementation.md§Phase 4Test plan
node .claude/skills/plan-execution/scripts/__tests__/run-subagent-against-fixtures.mjsagainst fixtures 01 + 05 + 09. Note: Layer 3 runner is NOT in this PR's scope (spec §8.3 marks it manual + non-CI).🤖 Generated with Claude Code