feat: elevate codex with claude depth and shared runtime features by mumit · Pull Request #1 · mumit/codex-dev-team

mumit · 2026-05-01T23:08:00Z

Summary

Two commits forming one logical elevation effort:

feat: deepen claude-dev-team parity — closes the v1.0.0 audit gaps: documents the stage-renumbering divergence, fleshes out role prompts (24 → 100-170 lines each), adds the Stage 0 safety stoplist + budget-gate config + async-checkpoint config, hardens approval-derivation.js with file locking and atomic writes, ports the audit-phases reference, and replaces the row-existence parity check with a deep content-depth check.
feat: elevate codex with claude depth and shared runtime features — ports claude-dev-team's prose depth AND adds three new runtime features neither framework had.

Prose depth ported from claude

.codex/rules/pipeline.md 145 → 393 lines (review shape scoped/matrix, READ-ONLY Reviewer Rule, gate merge strategy, review round limit, stage durations, parallelism)
.codex/rules/gates.md 74 → 280 lines (per-stage extra-field examples)
.codex/rules/coding-principles.md 62 → 151 lines
.codex/rules/execution-profiles.md 17 → 106 lines (full local/app-worktree/cloud model with parallelism patterns)

New narrative artifacts

EXAMPLE.md — 218-line end-to-end pipeline walkthrough (codex-ized password-reset feature)
CHANGELOG.md — versions v1.0.0, v1.1.0 (this), v1.2.0 (unreleased placeholder)
CONTRIBUTING.md — local dev setup, test/lint/parity commands, PR conventions, stage-numbering note

New runtime features (neither framework had these)

Feature	Script	What it does
Budget tracking	`scripts/budget.js`	Honors `budget.enabled` in `.codex/config.yml`; writes `pipeline/budget.md`; emits `stage-budget.json ESCALATE` on overrun (or warns). `init/update/check` subcommands.
Async-checkpoint auto-pass	`applyCheckpointAutoPass()` in `codex-team.js`	Honors `checkpoints.{a,b,c}.auto_pass_when` config (`no_warnings`, `all_criteria_passed`); writes `CHECKPOINT-AUTO-PASS:` line to context. Stoplist override prevents auto-pass on security-sensitive runs.
Pipeline visualization	`scripts/visualize.js`	Generates a Mermaid `stateDiagram-v2` of the active pipeline run, color-coded by gate status (PASS/FAIL/ESCALATE/missing). Writes `pipeline/diagram.md`.

Test rigor

125 → 169 tests across 16 suites. New tests:

tests/budget.test.js — 15 tests (init, update, check escalate/warn paths, disabled-mode no-op)
tests/checkpoints.test.js — 15 tests (each condition + null default + stoplist override)
tests/visualize.test.js — 14 tests (empty/active/complete pipelines, valid Mermaid syntax)

Plus all earlier deepening tests (parity-check main() + mutation tests, role-prompt line-count checks, config-key validation, audit-phases reference).

Stage numbering preserved

Codex keeps its collapsed numbering (Stage 5 = pre-review with security_review_required flag). The translation table to claude's Stage 4.5a/4.5b lives in docs/parity/claude-dev-team-parity.md under "Stage Numbering Divergence".

Test plan

npm test — all 169 pass
npm run doctor — all PASS
npm run parity:check — passes (deep check)
npm run lint — passes
npm run budget -- init — works (no-op when disabled)
npm run visualize — writes pipeline/diagram.md with valid Mermaid
npm run pipeline -- "Test feature" — workspace bootstraps cleanly
npm run next — track-aware advancement still works

🤖 Generated with Claude Code

Address audit findings where v1.0.0 had structural parity but shallow content parity. Document the stage-renumbering divergence, port behavioral content from claude agents into role prompts, add the safety stoplist + budget gate + async checkpoints from claude's Stage 0, harden the approval-derivation hook with file locking and atomic writes, port the audit-phases reference, and replace the row-existence parity check with a deep content check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Port claude-dev-team's pipeline rules depth (review shape, READ-ONLY rule, gate merge strategy, round limit, durations, parallelism), gates per-stage extra-field examples, coding principles, and execution profiles. Add EXAMPLE.md walkthrough (218 lines), CHANGELOG.md, and CONTRIBUTING.md. Implement three runtime features neither framework had: budget tracking (scripts/budget.js), async-checkpoint auto-pass logic, and Mermaid pipeline visualization (scripts/visualize.js). 44 new tests; 169/169 passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

When two gate files land in the same millisecond (common on fast CI filesystems), mtime-only sort is unstable and may pick the wrong "latest" gate. Add filename localeCompare as a stable secondary sort so latest-mode validation is reproducible. Resolves CI flake on tests/gate-validator.test.js:164 "validates every gate when requested".

mumit and others added 3 commits May 1, 2026 13:25

mumit merged commit e388c52 into main May 1, 2026
4 checks passed

mumit mentioned this pull request May 2, 2026

docs: refresh documentation for v1.2.0 elevation #2

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: elevate codex with claude depth and shared runtime features#1

feat: elevate codex with claude depth and shared runtime features#1
mumit merged 3 commits into
mainfrom
codex/elevate-with-claude-depth

mumit commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mumit commented May 1, 2026

Summary

Prose depth ported from claude

New narrative artifacts

New runtime features (neither framework had these)

Test rigor

Stage numbering preserved

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant