feat: elevate codex with claude depth and shared runtime features#1
Merged
Conversation
Address audit findings where v1.0.0 had structural parity but shallow content parity. Document the stage-renumbering divergence, port behavioral content from claude agents into role prompts, add the safety stoplist + budget gate + async checkpoints from claude's Stage 0, harden the approval-derivation hook with file locking and atomic writes, port the audit-phases reference, and replace the row-existence parity check with a deep content check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Port claude-dev-team's pipeline rules depth (review shape, READ-ONLY rule, gate merge strategy, round limit, durations, parallelism), gates per-stage extra-field examples, coding principles, and execution profiles. Add EXAMPLE.md walkthrough (218 lines), CHANGELOG.md, and CONTRIBUTING.md. Implement three runtime features neither framework had: budget tracking (scripts/budget.js), async-checkpoint auto-pass logic, and Mermaid pipeline visualization (scripts/visualize.js). 44 new tests; 169/169 passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When two gate files land in the same millisecond (common on fast CI filesystems), mtime-only sort is unstable and may pick the wrong "latest" gate. Add filename localeCompare as a stable secondary sort so latest-mode validation is reproducible. Resolves CI flake on tests/gate-validator.test.js:164 "validates every gate when requested".
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two commits forming one logical elevation effort:
feat: deepen claude-dev-team parity— closes the v1.0.0 audit gaps: documents the stage-renumbering divergence, fleshes out role prompts (24 → 100-170 lines each), adds the Stage 0 safety stoplist + budget-gate config + async-checkpoint config, hardensapproval-derivation.jswith file locking and atomic writes, ports the audit-phases reference, and replaces the row-existence parity check with a deep content-depth check.feat: elevate codex with claude depth and shared runtime features— ports claude-dev-team's prose depth AND adds three new runtime features neither framework had.Prose depth ported from claude
.codex/rules/pipeline.md145 → 393 lines (review shape scoped/matrix, READ-ONLY Reviewer Rule, gate merge strategy, review round limit, stage durations, parallelism).codex/rules/gates.md74 → 280 lines (per-stage extra-field examples).codex/rules/coding-principles.md62 → 151 lines.codex/rules/execution-profiles.md17 → 106 lines (full local/app-worktree/cloud model with parallelism patterns)New narrative artifacts
EXAMPLE.md— 218-line end-to-end pipeline walkthrough (codex-ized password-reset feature)CHANGELOG.md— versions v1.0.0, v1.1.0 (this), v1.2.0 (unreleased placeholder)CONTRIBUTING.md— local dev setup, test/lint/parity commands, PR conventions, stage-numbering noteNew runtime features (neither framework had these)
scripts/budget.jsbudget.enabledin.codex/config.yml; writespipeline/budget.md; emitsstage-budget.json ESCALATEon overrun (or warns).init/update/checksubcommands.applyCheckpointAutoPass()incodex-team.jscheckpoints.{a,b,c}.auto_pass_whenconfig (no_warnings,all_criteria_passed); writesCHECKPOINT-AUTO-PASS:line to context. Stoplist override prevents auto-pass on security-sensitive runs.scripts/visualize.jsstateDiagram-v2of the active pipeline run, color-coded by gate status (PASS/FAIL/ESCALATE/missing). Writespipeline/diagram.md.Test rigor
125 → 169 tests across 16 suites. New tests:
tests/budget.test.js— 15 tests (init, update, check escalate/warn paths, disabled-mode no-op)tests/checkpoints.test.js— 15 tests (each condition + null default + stoplist override)tests/visualize.test.js— 14 tests (empty/active/complete pipelines, valid Mermaid syntax)Plus all earlier deepening tests (parity-check
main()+ mutation tests, role-prompt line-count checks, config-key validation, audit-phases reference).Stage numbering preserved
Codex keeps its collapsed numbering (Stage 5 = pre-review with
security_review_requiredflag). The translation table to claude's Stage 4.5a/4.5b lives indocs/parity/claude-dev-team-parity.mdunder "Stage Numbering Divergence".Test plan
npm test— all 169 passnpm run doctor— all PASSnpm run parity:check— passes (deep check)npm run lint— passesnpm run budget -- init— works (no-op when disabled)npm run visualize— writespipeline/diagram.mdwith valid Mermaidnpm run pipeline -- "Test feature"— workspace bootstraps cleanlynpm run next— track-aware advancement still works🤖 Generated with Claude Code