WenyuChiou · WenyuChiou · May 21, 2026 · May 21, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -11,6 +11,32 @@ marketplace; see that repo's CHANGELOG for the catalog-side history.
 
 ## [Unreleased]
 
+### Added
+
+- `references/codex-prompt-blocks.md` — composable XML prompt blocks
+  (`<verification_loop>`, `<grounding_rules>`, `<action_safety>`, …), four
+  task recipes, and a prompt anti-pattern table for judgment-sensitive briefs.
+  Adapted from the `gpt-5-4-prompting` skill in
+  [`openai/codex-plugin-cc`](https://github.com/openai/codex-plugin-cc)
+  (Apache-2.0); reframed for this skill's single-shot brief-file workflow.
+
+### Changed
+
+- `SKILL.md` and `references/task-template.md` now point judgment-sensitive
+  tasks (debugging, write-capable changes, review, research) at the new
+  prompt-blocks reference; mechanical sweeps keep the flat template.
+- `references/patterns.md` (Pattern 5) and `references/review-checklist.md`:
+  added an explicit "do not auto-apply fixes from a review run — present
+  findings and ask the user first" rule.
+- `references/model-selection.md`: documented the `spark` shorthand
+  (`--model gpt-5.3-codex-spark`).
+
+### Fixed
+
+- `references/patterns.md` (Pattern 5): the review-mode code block used the
+  deprecated `--full-auto` flag; replaced with `--sandbox workspace-write` to
+  match `SKILL.md`, `wrapper.md`, and `examples.md`.
+
 ## [0.1.0] - 2026-05-15
 
 The initial published version. Captures the skill state at commit

diff --git a/skills/codex-delegate/SKILL.md b/skills/codex-delegate/SKILL.md
@@ -76,7 +76,7 @@ Full routing table and good/bad examples: `references/delegation-targets.md`.
 
 ## Workflow
 
-1. **Brief**: write `.ai/codex_task_<name>.md` with Context / Goal / Constraints / Acceptance. Template: `references/task-template.md`. If the brief was already written by `agent-task-splitter` at `.ai/codex_task_<NNN>_<slug>.md`, read `.coord/plan.yml` for round context first.
+1. **Brief**: write `.ai/codex_task_<name>.md` with Context / Goal / Constraints / Acceptance. Template: `references/task-template.md`. For a judgment-sensitive task (debugging, write-capable change, review, research) — where an unsupported guess or a half-finished fix would hurt — also add the XML prompt blocks from `references/codex-prompt-blocks.md` to the Goal/Constraints. A pure mechanical sweep does not need them. If the brief was already written by `agent-task-splitter` at `.ai/codex_task_<NNN>_<slug>.md`, read `.coord/plan.yml` for round context first.
 
 2. **Run**: from Claude Code Bash, invoke the wrapper from its install location:
    ```bash
@@ -112,6 +112,7 @@ Full routing table and good/bad examples: `references/delegation-targets.md`.
 - `references/delegation-targets.md` — when to use vs avoid
 - `references/wrapper.md` — full wrapper invocation, env vars, Windows runner notes
 - `references/task-template.md` — task brief template
+- `references/codex-prompt-blocks.md` — XML prompt blocks + recipes + anti-patterns for judgment-sensitive briefs
 - `references/output-contract.md` — full `.result.json` schema, status semantics, `.fallback_claude` quota sentinel
 - `references/review-checklist.md` — extended acceptance gate
 - `references/patterns.md` — five single-task delegation shapes (context file, parallel, resume, structured output, review mode)

diff --git a/skills/codex-delegate/references/codex-prompt-blocks.md b/skills/codex-delegate/references/codex-prompt-blocks.md
@@ -0,0 +1,248 @@
+# Codex prompt blocks — phrasing the task so Codex performs well
+
+`task-template.md` defines the *shape of the brief file* (Context / Goal /
+Constraints / Acceptance). This file defines the *phrasing layer*: a set of
+composable XML-tagged blocks you drop **inside** the brief's Goal and
+Constraints when the task is judgment-sensitive.
+
+For a pure mechanical sweep (batch rename, docstring add) the flat template is
+enough — do not bolt these blocks on. Reach for them when an unsupported guess,
+a half-finished fix, or an unrelated refactor would actually hurt the result:
+debugging, write-capable changes, review, research.
+
+> Adapted from the `gpt-5-4-prompting` skill in
+> [`openai/codex-plugin-cc`](https://github.com/openai/codex-plugin-cc)
+> (Apache-2.0). Reframed for this skill's single-shot brief-file workflow.
+
+## Where the blocks go
+
+The wrapper runs `--prompt "Read .ai/codex_task_<name>.md and execute..."`, so
+Codex reads the brief file. XML tags inside that markdown file are read fine.
+Put the blocks under the brief's Goal (what done looks like) or Constraints
+(how to stay safe). Keep them compact — a better contract beats raising the
+reasoning effort or padding the prompt with prose.
+
+Core rules:
+
+- One clear task per Codex run. Split unrelated asks into separate runs.
+- Tell Codex what *done* looks like; do not assume it infers the end state.
+- Add a block only where the task needs it. Remove redundant ones before sending.
+- Use the exact tag names below so briefs stay consistent across runs.
+
+## Block library
+
+Wrap each block in the XML tag shown. Pick the smallest set that fits.
+
+### `task` — use in nearly every brief
+
+```xml
+<task>
+The concrete job, the relevant repo or failure context, and the expected end state.
+</task>
+```
+
+### `structured_output_contract` — when the response shape matters
+
+```xml
+<structured_output_contract>
+Return exactly the requested output shape and nothing else.
+Keep it compact. Put the highest-value findings or decisions first.
+</structured_output_contract>
+```
+
+### `compact_output_contract` — concise prose instead of a schema
+
+```xml
+<compact_output_contract>
+Keep the final answer compact and structured.
+No long scene-setting, no repeated recap.
+</compact_output_contract>
+```
+
+### `default_follow_through_policy` — when Codex should act, not ask
+
+```xml
+<default_follow_through_policy>
+Default to the most reasonable low-risk interpretation and keep going.
+Only stop to ask when a missing detail changes correctness, safety, or an
+irreversible action.
+</default_follow_through_policy>
+```
+
+### `completeness_contract` — multi-step work that must not stop early
+
+```xml
+<completeness_contract>
+Resolve the task fully before stopping. Do not stop at the first plausible
+answer. Check for follow-on fixes, edge cases, or cleanup needed for a
+correct result.
+</completeness_contract>
+```
+
+### `verification_loop` — when correctness matters
+
+```xml
+<verification_loop>
+Before finalizing, verify the result against the task requirements and the
+changed files or tool outputs. If a check fails, revise — do not report the
+first draft.
+</verification_loop>
+```
+
+### `missing_context_gating` — when Codex might otherwise guess
+
+```xml
+<missing_context_gating>
+Do not guess missing repository facts. If required context is absent, retrieve
+it with tools or state exactly what remains unknown.
+</missing_context_gating>
+```
+
+### `grounding_rules` — review, research, root-cause analysis
+
+```xml
+<grounding_rules>
+Ground every claim in the provided context or your tool outputs.
+Do not present inferences as facts. Label any hypothesis clearly.
+</grounding_rules>
+```
+
+### `action_safety` — write-capable or potentially broad tasks
+
+```xml
+<action_safety>
+Keep changes tightly scoped to the stated task. Avoid unrelated refactors,
+renames, or cleanup unless required for correctness. Call out any risky or
+irreversible action before taking it.
+</action_safety>
+```
+
+### `research_mode` — exploration, comparisons, recommendations
+
+```xml
+<research_mode>
+Separate observed facts, reasoned inferences, and open questions.
+Prefer breadth first, then go deeper only where evidence changes the answer.
+</research_mode>
+```
+
+### `dig_deeper_nudge` — review and adversarial inspection
+
+```xml
+<dig_deeper_nudge>
+After the first plausible issue, check for second-order failures, empty-state
+behavior, retries, stale state, and rollback paths before finalizing.
+</dig_deeper_nudge>
+```
+
+### `progress_updates` — long-running, tool-heavy runs
+
+```xml
+<progress_updates>
+Keep progress updates brief and outcome-based. Mention only major phase
+changes or blockers.
+</progress_updates>
+```
+
+## Recipes
+
+Copy the smallest recipe that fits, then trim. These slot into the brief's
+Goal/Constraints; the brief still carries Context (file lists) and Acceptance
+(verification commands) per `task-template.md`.
+
+### Narrow fix
+
+```xml
+<task>
+Implement the smallest safe fix for the identified issue. Preserve existing
+behavior outside the failing path.
+</task>
+<structured_output_contract>
+Return: 1. summary of the fix  2. touched files  3. verification performed
+4. residual risks or follow-ups
+</structured_output_contract>
+<completeness_contract>
+Resolve the task fully. Do not stop after identifying the issue without
+applying the fix.
+</completeness_contract>
+<verification_loop>
+Before finalizing, verify the fix matches the requirements and the changed
+code is coherent.
+</verification_loop>
+<action_safety>
+Keep changes tightly scoped. Avoid unrelated refactors or cleanup.
+</action_safety>
+```
+
+### Diagnosis (read-only — pair with a read-only run)
+
+```xml
+<task>
+Diagnose why the failing test or command breaks in this repository. Identify
+the most likely root cause.
+</task>
+<compact_output_contract>
+Return: 1. most likely root cause  2. evidence  3. smallest safe next step
+</compact_output_contract>
+<missing_context_gating>
+Do not guess missing repository facts. State exactly what remains unknown.
+</missing_context_gating>
+<verification_loop>
+Before finalizing, verify the proposed root cause matches the observed evidence.
+</verification_loop>
+```
+
+### Root-cause review
+
+```xml
+<task>
+Analyze this change for the most likely correctness or regression issues.
+Focus on the provided repository context only.
+</task>
+<structured_output_contract>
+Return: 1. findings ordered by severity  2. supporting evidence per finding
+3. brief next steps
+</structured_output_contract>
+<grounding_rules>
+Ground every claim in the repo context or tool outputs. Label inferences.
+</grounding_rules>
+<dig_deeper_nudge>
+Check second-order failures, empty-state handling, retries, stale state, and
+rollback paths before finalizing.
+</dig_deeper_nudge>
+```
+
+### Research / recommendation
+
+```xml
+<task>
+Research the available options and recommend the best path for this task.
+</task>
+<structured_output_contract>
+Return: 1. observed facts  2. reasoned recommendation  3. tradeoffs
+4. open questions
+</structured_output_contract>
+<research_mode>
+Separate observed facts, reasoned inferences, and open questions.
+</research_mode>
+```
+
+## Prompt anti-patterns
+
+These are about *prompt phrasing*, distinct from the brief-file anti-patterns
+in `task-template.md` (vague goals, missing scope fence).
+
+| Anti-pattern | Bad | Fix |
+|---|---|---|
+| Vague task framing | "Take a look at this and let me know what you think." | Wrap a concrete job in `<task>`. |
+| Missing output contract | "Investigate and report back." | Add `structured_output_contract` or `compact_output_contract`. |
+| No follow-through default | "Debug this failure." (Codex stalls asking) | Add `default_follow_through_policy`. |
+| Asking for more reasoning | "Think harder and be very smart." | Add `verification_loop` — a contract beats a pep talk. |
+| Mixing unrelated jobs | "Review this diff, fix the bug, update docs, suggest a roadmap." | One job per run; split into separate runs. |
+| Unsupported certainty | "Tell me exactly why production failed." | Add `grounding_rules` so inferences stay labeled. |
+
+## Cross-references
+
+- `task-template.md` — the brief-file shape these blocks slot into
+- `patterns.md` — the five delegation shapes (which recipe pairs with which)
+- `review-checklist.md` — Claude's acceptance gate after the run
diff --git a/skills/codex-delegate/references/model-selection.md b/skills/codex-delegate/references/model-selection.md
@@ -33,6 +33,20 @@ Both produced correct, runnable code. The semantic difference is style: `gpt-5.5
 - Wall-time pressure (interactive iteration, TDD-style loops).
 - Time-to-first-byte matters more than the final-byte quality.
 
+## The `spark` shorthand
+
+`gpt-5.3-codex-spark` is a low-latency, low-cost Codex model. There is no alias
+resolution in the wrapper — it passes `--model` straight through — so the
+shorthand is a documentation convention: when the user (or a brief) says
+**`spark`**, invoke the wrapper with `--model gpt-5.3-codex-spark`
+(`-Model gpt-5.3-codex-spark` in PowerShell).
+
+Reach for `spark` when the task is trivially mechanical and latency dominates —
+a fast first pass, a TDD-style loop, or a sweep where even `gpt-5.4` is more
+model than the edit needs. For anything where idiomatic style or subtle
+correctness matters, stay on the default. Confirm the exact name your CLI
+exposes with `codex models`.
+
 ## How to A/B another task in your project
 
 ```bash

diff --git a/skills/codex-delegate/references/patterns.md b/skills/codex-delegate/references/patterns.md
@@ -132,11 +132,17 @@ Steps:
 2. Run review mode:
 
    ```bash
-   codex exec review --full-auto </dev/null
+   codex exec review --sandbox workspace-write </dev/null
    ```
 
 3. Read the review output as a hint, not a verdict. Claude still owns the acceptance decision and runs verification.
 
+**Do not auto-apply fixes from a review run.** After presenting the findings,
+stop. Surface them to the user and ask which, if any, they want fixed before
+touching a single file — even when a fix looks obvious. A review produces a
+list of *candidate* issues, not an approved work order; turning it straight
+into edits skips the user's prioritisation call.
+
 Review mode is cheaper than re-running the full implementation pattern when you only want a sanity check.
 
 ---

diff --git a/skills/codex-delegate/references/review-checklist.md b/skills/codex-delegate/references/review-checklist.md
@@ -34,3 +34,11 @@ Before accepting a Codex run, Claude must verify each of these:
 ## Decision
 
 If any answer is *no*, fix the task file or take the rest locally in Claude. Do not paper over a bad run.
+
+## Review-mode runs (Pattern 5)
+
+This checklist gates *acceptance of a delegated implementation*. A **review
+mode** run (`patterns.md` Pattern 5) is different: it produces candidate
+findings, not a diff to accept. After presenting those findings, stop — do not
+auto-apply fixes. Ask the user which issues they want addressed before
+touching any file, even when a fix looks obvious.
diff --git a/skills/codex-delegate/references/task-template.md b/skills/codex-delegate/references/task-template.md
@@ -43,3 +43,15 @@ Save the brief at `.ai/codex_task_<name>.md`. If the task is part of a multi-age
 - Missing scope fence (Codex will then "improve" unrelated files)
 - Acceptance written as prose instead of executable verification
 - Asking Codex to "decide" between alternatives instead of just executing
+
+These are the failure modes of the *brief file*. The failure modes of the
+*task phrasing* (missing output contract, "think harder", mixing unrelated
+jobs) are separate — see `codex-prompt-blocks.md`.
+
+## Judgment-sensitive tasks
+
+For debugging, write-capable changes, review, or research, the flat template
+above is not enough on its own. Add the composable XML prompt blocks
+(`<verification_loop>`, `<grounding_rules>`, `<action_safety>`, …) from
+`codex-prompt-blocks.md` to the Goal/Constraints. A pure mechanical sweep
+does not need them.