Skip to content

chore(repo): add plan-execution housekeeper narration-mode guardrails#53

Merged
Sawmonabo merged 5 commits into
developfrom
chore/plan-execution-narration-guardrails
May 11, 2026
Merged

chore(repo): add plan-execution housekeeper narration-mode guardrails#53
Sawmonabo merged 5 commits into
developfrom
chore/plan-execution-narration-guardrails

Conversation

@Sawmonabo
Copy link
Copy Markdown
Owner

Summary

Phase E plan-execution-housekeeper subagent had a documented failure (PR #36, #42, #45, #51 — discovered during PR #51 housekeeping on 2026-05-11) where the model emitted Tool: Edit\n{'file_path': '...'} as text content instead of invoking the tool API, returning RESULT: DONE with totalToolUseCount: 0. Prior orchestrators silently applied the edits directly with no validator signal — fragile and undetectable. This PR adds defense-in-depth so the failure surfaces deterministically and the orchestrator routes through a documented auto-deviation fallback.

What changed

  • validateManifestSubagentStage check docs(repo): resolve Tier 1-3 plan governance audit findings #13 (narration_mode_detected) — fires when manifest shape matches the script-stage verbatim (no result, empty semantic_edits + concerns, no subagent_completed_at) AND _script_stage.semantic_work_pending is non-empty. Gap message names totalToolUseCount, lists the four historical PRs, and routes to the SKILL.md auto-deviation fallback.
  • validate-subagent-manifest.mjs CLI wrapper — new orchestrator-facing script. Exit codes: 0 valid · 1 narration_mode → auto-deviation · 2 other gaps → round-trip · 3 invocation error. Stdout is single-line JSON for jq-piping; stderr carries the human-readable gap list.
  • SKILL.md Phase E — step 3 snapshots stage-1 manifest to a sidecar .agents/tmp/housekeeper-stage1-PR<N>.json before subagent dispatch (untamperable baseline for preservation checks); step 5 rewritten as a mechanical CLI invocation with exit-code routing. New § Subagent narration auto-deviation fallback section codifies the 4-step deterministic workaround (apply edits directly → rewrite manifest with kind: orchestrator_applied_semantic_edits_due_to_subagent_narration concern → re-validate → advance). Explicitly notes deviation does NOT require user authorization (validator exit-1 is the gate).
  • plan-execution-housekeeper.md — rewritten frontmatter + opener with action-verb framing ("edit files, then write the manifest, then return RESULT"). New ## Failure-mode warning documents the post-mortem; new ## First action (mandatory) and ## Required tool sequence enforce "first action MUST be Read on the manifest." Hard Rules tightened: no-narration ban, first-Read mandate, clarification that <TODO subagent prose> replacement requires Edit against actual files (not just recording in semantic_edits).
  • plan-execution-contract-author.md — preemptive narration-mode warning. Shares the housekeeper's tool set (Read/Grep/Glob/Edit/Write) + abstract framing, so it sits at the same risk surface. First concrete tool invocation MUST be Read on a target_paths entry.
  • 5 regression tests in post-merge-housekeeper-orchestrator-helpers.test.mjs:
    • PR feat(contracts): add PtyHost interface (Plan-024 T-024-2-1) #45 verbatim manifest replay → narration gap fires + asserts mentions of pending-item count, totalToolUseCount, auto-deviation fallback, and the concern kind
    • 4 negative cases: populated semantic_edits, legitimate BLOCKED halt, vacuous (no pending work), non-canonical truthy result

Test plan

  • node --experimental-strip-types --test .claude/skills/plan-execution/scripts/__tests__/post-merge-housekeeper-orchestrator-helpers.test.mjs113/113 pass (5 new + 108 pre-existing)
  • Local pre-commit hooks clear (eslint, prettier, docs-anchor-check, docs-corpus-checks, gitleaks, commitlint)
  • Audit of prior housekeeping commits — grep of <TODO subagent prose> across docs/architecture/cross-plan-dependencies.md + docs/plans/*.md returns 0 occurrences (zero latent missing edits; prior orchestrators applied semantic edits directly).
  • CI green (typecheck, lint, test, build, gitleaks)

Review Notes

  • Agent definition changes in .claude/agents/ require a Claude Code session restart to activate (per SKILL.md iteration caveat). Skill-side changes (helpers, SKILL.md, tests, CLI wrapper) are live-reloaded.
  • No production V1 code touched; this is dev-tooling hardening for the /plan-execution skill.

Refs: Plan-024 PR #51 housekeeping post-mortem (2026-05-11)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 844cdb0be0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +48 to +51
if (args[i] === "--stage1" && args[i + 1]) {
stage1Path = args[i + 1];
i++;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject --stage1 when no snapshot path is provided

The new CLI parser silently ignores a trailing --stage1 with no value, because the flag is only handled when args[i + 1] is truthy. In that case the script continues as if no baseline was passed, which changes routing from an invocation error (exit 3) to normal validation flow (exit 1/2) and can trigger the wrong remediation path. This should be treated as malformed invocation and fail fast so orchestration bugs are surfaced deterministically.

Useful? React with 👍 / 👎.

Comment on lines +402 to +403
const hasPendingWork =
Array.isArray(cleanedSemanticWorkPending) && cleanedSemanticWorkPending.length > 0;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Base narration detection on script-stage pending items

The narration-mode guard currently checks pending work only from manifest.semantic_work_pending (subagent-emitted), not from the trusted script-stage snapshot. If a bad subagent response clears the top-level pending list while still leaving result non-canonical and no edits/concerns, this check won’t fire and the orchestrator falls back to generic round-trip behavior instead of the dedicated auto-deviation path. Using the resolved script-stage pending set here would preserve the intended deterministic routing.

Useful? React with 👍 / 👎.

- narration check sources hasPendingWork from effScriptSemanticWorkPending
  (trusted stage-1 baseline) instead of manifest.semantic_work_pending
  (subagent-emitted). A bad subagent clearing the top-level array no longer
  demotes routing from narration auto-deviation to generic round-trip.
- CLI wrapper rejects --stage1 with no path argument (exit 3) instead of
  silently ignoring it. Adds regression test for the explicit-param >
  _script_stage precedence chain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eb3914ec42

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .claude/skills/plan-execution/SKILL.md Outdated
Comment on lines +489 to +490
cp .agents/tmp/housekeeper-manifest-PR<N>.json \
.agents/tmp/housekeeper-stage1-PR<N>.json # if not done already at step 3
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove post-dispatch stage-1 snapshot copy in step 5

This cp runs inside Phase E step 5 (after subagent dispatch), so if step 3’s pre-dispatch snapshot was missed and someone follows this block literally, --stage1 will point to a copy of the already-mutated stage-2 manifest. That destroys the trusted baseline and lets preservation checks (#7/#9/#10/#11) compare the manifest against itself, which can hide subagent removal of script-stage affected_files, schema_violations, or verification_failures.

Useful? React with 👍 / 👎.

Comment thread .claude/skills/plan-execution/SKILL.md Outdated
Comment on lines +487 to +488
# `--stage1` below — the validator falls back to manifest._script_stage with a
# baseline-trust gap (non-fatal but signals reduced preservation coverage).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Treat missing --stage1 as fatal in routing docs

The guidance says omitting --stage1 is “non-fatal,” but the validator wrapper exits nonzero for any non-narration gap, and check #12 always emits a baseline-trust gap when stage-1 arrays are not explicitly plumbed. In practice this means dropping --stage1 cannot advance step 5 and forces a round-trip loop instead of degraded-but-successful validation.

Useful? React with 👍 / 👎.

@Sawmonabo
Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Already looking forward to the next diff.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- step 5 no longer contains a `cp manifest stage1` fallback. By step 5 the
  manifest has been mutated by the subagent dispatch, so the copy would
  alias the stage-1 baseline to tampered state and silently disable
  preservation checks #7/#9/#10/#11. (Codex P1)
- step 3 snapshot promoted from "optional but recommended" to REQUIRED,
  with the cp command moved into a fenced code block. Drops the misleading
  "non-fatal but signals reduced preservation coverage" claim — omitting
  --stage1 actually forces a check #12 round-trip loop (exit 2). (Codex P2)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Sawmonabo
Copy link
Copy Markdown
Owner Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 783684d74b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +127 to +129
if (narrationDetected) process.exit(1);
if (!result.valid) process.exit(2);
process.exit(0);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid forced process exit before stdout flushes

Replace the unconditional process.exit(...) calls with process.exitCode = ... (or wait for stream-drain callbacks) because this script writes its machine-readable JSON to stdout immediately beforehand. In piped/slow-output contexts (... | jq, CI log collectors, or loaded runners), Node can terminate before async stdout writes finish, which intermittently truncates the JSON payload and breaks orchestrator parsing even when validation succeeded.

Useful? React with 👍 / 👎.

Wrap the script body in main() returning the exit code, then assign
`process.exitCode = main()`. The previous unconditional `process.exit(N)`
after `process.stdout.write(JSON)` could terminate Node before async
stdout writes drained when stdout is piped (e.g., `... | jq`, CI log
collectors), intermittently truncating the JSON payload and breaking
orchestrator parsing even when validation succeeded. (Codex PR #53 R3 P1)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Sawmonabo
Copy link
Copy Markdown
Owner Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 617754b36e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +135 to +136
if (narrationDetected) return 1;
if (!result.valid) return 2;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Gate exit-1 routing on narration being the only gap

This wrapper always returns exit code 1 whenever narration_mode_detected is present, even if other gaps are also present (for example the baseline-trust gap raised when --stage1 plumbing is missing). In that mixed-gap case, Phase E is routed into the auto-deviation path instead of the required round-trip/halt path, so the orchestrator can proceed using subagent-emitted snapshot data that the validator explicitly marks as untrusted. Exit 1 should be emitted only when narration is the sole actionable failure class (or at least when no higher-priority integrity gaps are present).

Useful? React with 👍 / 👎.

Comment thread .claude/skills/plan-execution/SKILL.md Outdated
Comment on lines +517 to +518
2. **Rewrite the manifest.** Set `result: "DONE_WITH_CONCERNS"`, populate `semantic_edits` with one entry per pending item (each carrying a short `summary` of what changed + the file + line), and add a `concerns` entry of the form:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep BLOCKED result when fallback starts from halt states

The fallback now unconditionally instructs the orchestrator to set result: "DONE_WITH_CONCERNS", but Phase E also dispatches on script exit classes that require BLOCKED semantics (e.g., when schema_violations or verification_failures are present). In those contexts, the validator enforces result === "BLOCKED", so following this instruction causes deterministic re-validation failure and can trap the flow in avoidable retry loops; the fallback should set BLOCKED whenever those halt-condition arrays are non-empty.

Useful? React with 👍 / 👎.

- CLI exit-1 routing gated on definitional-narration gaps only (Codex
  R4 P1). The narration scenario inherently fires 3 gaps (result
  non-canonical, narration_mode_detected, per-item unaddressed). Exit 1
  must NOT win when a non-definitional integrity gap (baseline-trust
  check #12, preservation #7/#9/#10/#11, halt-state #8, schema-surfacing
  #5, placeholder leak, etc.) also fires — auto-deviation off untrusted
  _script_stage would compound the bug. Uses an allow-list of three
  patterns; new gap classes default to integrity (exit 2).
- Auto-deviation fallback recipe (SKILL.md §) now sets result
  conditionally on _script_stage halt-state arrays (Codex R4 P2).
  schema_violations or verification_failures non-empty → BLOCKED;
  otherwise DONE_WITH_CONCERNS. Hard-coding DONE_WITH_CONCERNS would
  deterministically re-fail check #8 and trap the flow.
- Adds child_process-spawned regression test exercising both routing
  paths (narration-only → exit 1; narration + --stage1 omitted → exit 2
  via baseline-trust gap). 115/115 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Sawmonabo
Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Chef's kiss.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@Sawmonabo Sawmonabo merged commit 8b1bf4e into develop May 11, 2026
8 checks passed
@Sawmonabo Sawmonabo deleted the chore/plan-execution-narration-guardrails branch May 11, 2026 22:50
Sawmonabo added a commit that referenced this pull request May 12, 2026
…55)

Two related fixes that unblock plan-execution-housekeeper and
plan-execution-contract-author dispatches.

YAML registration: Claude Code's session-start parser handled
`tools: ["Read", ...]` flow-form inconsistently — same byte format
parsed as a real array for some agents, as a literal string for these
two, leaving zero tools registered ("No such tool available"). Switch
to block-list YAML form so the parser registers the tool array
reliably.

Narration anti-pattern: agent bodies contained literal `Tool: <Name>`
strings in failure-mode warnings and "do not output X" sections.
Opus 4.7's literal instruction following caused the model to emit
those tokens as text content instead of invoking the tool API
(totalToolUseCount: 0 across PR #36 / #42 / #45 / #51 dispatches).
Replace with positive-only action contracts; let the orchestrator's
existing validator (check #13, narration_mode_detected) and
auto-deviation fallback handle the defense-in-depth.

Housekeeper cleanup: consolidate duplicate `## First action (mandatory)`
into `## Manifest contents` so each heading has a single, distinct
purpose.

Validation: post-restart smoke tests exercise both agents end-to-end
(housekeeper edits target.md + writes manifest preserving _script_stage;
contract-author writes a Zod schema file). All 393 plan-execution skill
tests pass (including the buildHousekeeperPrompt snapshot test that
pins the orchestrator's runtime brief), all 411 package tests pass,
typecheck and lint clean.

Builds on the guardrails landed in #53.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant