A project-level harness for AI coding agents (built around Claude Code, but the patterns generalize) that adds enforcement, scaffolding, cost-aware quality gates, background audits, multi-agent cross-validation, scheduled / unattended agent runs, and a self-testing layer to any codebase. This is a sanitized public template distilled from a private working harness used team-wide on a Rust + AWS Lambda backend; domain specifics have been stripped, the structural patterns are kept intact.
- Enforcement — pre/post tool-call shell hooks that block forbidden commit footers, prevent panics in production paths, auto-format on save, and force long-running checks into the background. See
.claude/hooks/. - Scaffolding — slash-command skills that codify a project's "how to add X" recipes as agent-readable runbooks. See
.claude/skills/. - Cost-aware quality gates — a runner that compresses verbose tool output via a cheap model before it lands in the conversation context. See
.claude/scripts/check.sh. - Background pre-merge audits — detached Claude Haiku audits spawned on PR-lifecycle commands, replacing CI minutes with a few cents of model calls. See
.claude/hooks/spawn-pr-audit.shand.claude/prompts/pre-merge-audit.md. - Multi-agent cross-validation — exploits the price gap between Claude and Codex by running them as independent reviewers of each other.
EnterPlanModespawns a backgroundcodex execplanning the same task;ExitPlanModewaits for it and feeds the independent plan back as a one-shot deny so the primary agent must reconcile before proceeding. Code-review uses the same separation-of-concerns principle: the implementer agent never reviews its own output — a Codex pass and a subagent pass review independently. See.claude/hooks/codex-cross-validate-plan-start.sh. - Self-testing — a shell-based test suite for the hooks themselves, so harness changes can't silently regress. See
.claude/hooks/tests/. - Unattended scheduled agents — the harness is also a runtime for long-lived agent jobs, not just an inline assistant. Cron-style and one-shot agent runs handle the recurring engineering work that doesn't need a human in the loop: dependency audits, regression checks against the hook suite, stale-document sweeps, prompt-registry migrations, observability passes. A day's worth of routine engineering work lands in the morning report instead of consuming the developer's working hours.
AI-assisted development surfaces four problems at once in any non-trivial codebase. Context drift: rules in long-form docs (CLAUDE.md, AGENTS.md, READMEs) fade across hundreds of agent turns. Hard-to-audit agent actions: commits, PRs, and base-branch pushes happen without a separate review pass. Cost blowups: verbose compiler or test output dumped into the conversation context evicts something more important and inflates token use. And the absence of team-level policy enforcement beyond hope and code review.
This harness addresses all four. Hooks enforce policy at the tool-call boundary; skills re-state recipes at the moment they apply; a compressing runner protects context budgets; and a detached audit checks each PR-bound diff against repo-specific contracts before merge.
- Fail-open over fail-closed. Hooks silently exit 0 on missing dependencies — a misconfigured dev machine should never block a developer from working.
- Structural over textual. Pattern matchers normalize, allowlist subcommands, or diff before flagging; wherever a textual approach has obvious false-positive cases, the harness invests in a structural one.
- Nudge before block. Hard blockers exist only for rules with an unambiguous canonical form; everything else is an informational reminder via
additionalContextJSON. - Test-the-tools. Hook changes require a test case in the bundled suite before merging — the harness eats its own dog food.
- Money awareness. Output compression, audit word caps, and running expensive checks on a cheaper model locally instead of in CI are deliberate choices to preserve both context budget and dollars.
- Don't lock the team in. Skills and hooks are committed; per-developer permissions and scratch worktrees are gitignored. The harness expresses a baseline, not a straitjacket.
- Separation of concerns across agents. The implementer never reviews its own work. Plans, reviews, and audits are routed to an independent model — usually a cheaper one — so the harness gets a second opinion without doubling the primary agent's cost.
- Agent as employee, not as pair-programmer. The primary failure mode of coding-agent adoption is treating the agent as something you sit next to. The harness pushes the other way: hooks let an agent act unsupervised at the tool-call boundary; scheduled jobs let it act unsupervised across time. A well-bounded agent doing routine work overnight is more valuable than a chatty one that needs hand-holding.
- Clone or copy this repo's
.claude/directory into your own project's root. - Edit
.claude/settings.jsonif your layout differs — the wiring uses$CLAUDE_PROJECT_DIRso the defaults work for most repos. - Run
bash .claude/hooks/tests/run-all.shto verify the harness is healthy on your machine. - Customize the example domain rule in
.claude/prompts/pre-merge-audit.md— the six checks that ship are illustrative; the structure is the reusable part. - Commit, PR, and your team picks up the same baseline on the next pull.
This is a template, not a product.
- MIT licensed. Use freely; the patterns are more interesting than the code.
- No language assumptions beyond Rust + AWS Lambda in the examples. The example hooks reference
cargo,rustfmt, and.unwrap()because that's the soil they grew in. The patterns generalize cleanly to other stacks (Python + Lambda, Go + Cloud Run, TypeScript + Vercel) — you will need to adapt the example shell glue. - No CI integration. The pre-merge audit hook is the only "CI-shaped" piece, and it runs locally by design.
- The audit prompt's six checks are illustrative. Fork and rewrite them for your stack's contracts; the harness is the framing, not the rule set.
Distilled from production use at a private Rust + AWS Lambda backend, 2025–2026.