agent-harness

Built by Fang — contact via GitHub or LinkedIn.

A project-level harness for AI coding agents (built around Claude Code, but the patterns generalize) that adds enforcement, scaffolding, cost-aware quality gates, background audits, multi-agent cross-validation, scheduled / unattended agent runs, and a self-testing layer to any codebase. This is a sanitized public template distilled from a private working harness used team-wide on a Rust + AWS Lambda backend; domain specifics have been stripped, the structural patterns are kept intact.

What's inside

Enforcement — pre/post tool-call shell hooks that block forbidden commit footers, prevent panics in production paths, auto-format on save, and force long-running checks into the background. See .claude/hooks/.
Scaffolding — slash-command skills that codify a project's "how to add X" recipes as agent-readable runbooks. See .claude/skills/.
Cost-aware quality gates — a runner that compresses verbose tool output via a cheap model before it lands in the conversation context. See .claude/scripts/check.sh.
Background pre-merge audits — detached Claude Haiku audits spawned on PR-lifecycle commands, replacing CI minutes with a few cents of model calls. See .claude/hooks/spawn-pr-audit.sh and .claude/prompts/pre-merge-audit.md.
Multi-agent cross-validation — exploits the price gap between Claude and Codex by running them as independent reviewers of each other. EnterPlanMode spawns a background codex exec planning the same task; ExitPlanMode waits for it and feeds the independent plan back as a one-shot deny so the primary agent must reconcile before proceeding. Code-review uses the same separation-of-concerns principle: the implementer agent never reviews its own output — a Codex pass and a subagent pass review independently. See .claude/hooks/codex-cross-validate-plan-start.sh.
Self-testing — a shell-based test suite for the hooks themselves, so harness changes can't silently regress. See .claude/hooks/tests/.
Unattended scheduled agents — the harness is also a runtime for long-lived agent jobs, not just an inline assistant. Cron-style and one-shot agent runs handle the recurring engineering work that doesn't need a human in the loop: dependency audits, regression checks against the hook suite, stale-document sweeps, prompt-registry migrations, observability passes. A day's worth of routine engineering work lands in the morning report instead of consuming the developer's working hours.

Why this exists

AI-assisted development surfaces four problems at once in any non-trivial codebase. Context drift: rules in long-form docs (CLAUDE.md, AGENTS.md, READMEs) fade across hundreds of agent turns. Hard-to-audit agent actions: commits, PRs, and base-branch pushes happen without a separate review pass. Cost blowups: verbose compiler or test output dumped into the conversation context evicts something more important and inflates token use. And the absence of team-level policy enforcement beyond hope and code review.

This harness addresses all four. Hooks enforce policy at the tool-call boundary; skills re-state recipes at the moment they apply; a compressing runner protects context budgets; and a detached audit checks each PR-bound diff against repo-specific contracts before merge.

Design principles

Fail-open over fail-closed. Hooks silently exit 0 on missing dependencies — a misconfigured dev machine should never block a developer from working.
Structural over textual. Pattern matchers normalize, allowlist subcommands, or diff before flagging; wherever a textual approach has obvious false-positive cases, the harness invests in a structural one.
Nudge before block. Hard blockers exist only for rules with an unambiguous canonical form; everything else is an informational reminder via additionalContext JSON.
Test-the-tools. Hook changes require a test case in the bundled suite before merging — the harness eats its own dog food.
Money awareness. Output compression, audit word caps, and running expensive checks on a cheaper model locally instead of in CI are deliberate choices to preserve both context budget and dollars.
Don't lock the team in. Skills and hooks are committed; per-developer permissions and scratch worktrees are gitignored. The harness expresses a baseline, not a straitjacket.
Separation of concerns across agents. The implementer never reviews its own work. Plans, reviews, and audits are routed to an independent model — usually a cheaper one — so the harness gets a second opinion without doubling the primary agent's cost.
Agent as employee, not as pair-programmer. The primary failure mode of coding-agent adoption is treating the agent as something you sit next to. The harness pushes the other way: hooks let an agent act unsupervised at the tool-call boundary; scheduled jobs let it act unsupervised across time. A well-bounded agent doing routine work overnight is more valuable than a chatty one that needs hand-holding.

How to adopt

Clone or copy this repo's .claude/ directory into your own project's root.
Edit .claude/settings.json if your layout differs — the wiring uses $CLAUDE_PROJECT_DIR so the defaults work for most repos.
Run bash .claude/hooks/tests/run-all.sh to verify the harness is healthy on your machine.
Customize the example domain rule in .claude/prompts/pre-merge-audit.md — the six checks that ship are illustrative; the structure is the reusable part.
Commit, PR, and your team picks up the same baseline on the next pull.

What's intentionally not here

This is a template, not a product.

MIT licensed. Use freely; the patterns are more interesting than the code.
No language assumptions beyond Rust + AWS Lambda in the examples. The example hooks reference cargo, rustfmt, and .unwrap() because that's the soil they grew in. The patterns generalize cleanly to other stacks (Python + Lambda, Go + Cloud Run, TypeScript + Vercel) — you will need to adapt the example shell glue.
No CI integration. The pre-merge audit hook is the only "CI-shaped" piece, and it runs locally by design.
The audit prompt's six checks are illustrative. Fork and rewrite them for your stack's contracts; the harness is the framing, not the rule set.

Distilled from production use at a private Rust + AWS Lambda backend, 2025–2026.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.claude		.claude
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
JOBS.md		JOBS.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

agent-harness

What's inside

Why this exists

Design principles

How to adopt

What's intentionally not here

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

agent-harness

What's inside

Why this exists

Design principles

How to adopt

What's intentionally not here

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages