DoD-Guard

Definition of Done as an executable barrier for Claude Code

Stops AI agents (and humans) from declaring tasks done while the code is still half-baked.

Why this exists

LLM-driven coding assistants are wired to declare "done." The failure modes are everywhere:

A function ships as pass / return None / TODO: implement.
A test asserts expect(x).toBeDefined() — it proves the function exists, not that it works.
A summary claims "tests pass" while three tests were silently .skip-ed.
An end-to-end behavior is announced without a single command being run.
A confident final report has zero supporting evidence.

DoD-Guard turns the Definition of Done from a prompt into a wall the agent cannot talk past. Hooks block at the source of every shortcut. Read-only adversarial subagents audit the orchestrator from the outside. The Stop hook refuses to release the turn while the DoD is unmet — with proper stop_hook_active loop prevention.

How it works

Five reinforcing layers, ordered by cost-to-execute (cheapest first):

┌──────────────────────────────────────────────────────────────┐
│  L5  Skills            reframe how the orchestrator plans    │
│  L4  Slash commands    /dod:verify  /dod:audit  /dod:confess │
│  L3  Subagents         7 adversarial validators (read-only)  │
│  L2  Hooks             SessionStart · PostToolUse · Stop     │
│  L1  Detectors         bash + python — milliseconds, no LLM  │
└──────────────────────────────────────────────────────────────┘

Layer	What it stops
Detectors	Stubs, TODOs, empty bodies, tautological tests, NotImpl markers
Hooks	Bypass attempts via edit, commit, declaring done before verifying
Subagents	Bugs detectors miss; weak tests; unverified claims; regressions; e2e gaps
Slash commands	Manual entry points; emergency rituals (`/dod:confess`)
Skills	Reframe the orchestrator's worldview when entering a DoD-guarded project

See docs/ARCHITECTURE.md for the layer diagram and the full flow on "Claude declares done."

Install

Inside any Claude Code session:

/plugin marketplace add https://github.com/atoslins/dod-guard
/plugin install dod-guard@dod-guard-local
/reload-plugins

To develop locally:

git clone https://github.com/atoslins/dod-guard
/plugin marketplace add /absolute/path/to/dod-guard
/plugin install dod-guard@dod-guard-local
/reload-plugins

First-run

Inside any project you want to guard:

/dod:init                # detects stack (Node / Python / Go / Rust), writes config + DOD.md
/dod:checklist           # shows the Definition of Done
/dod:verify              # 30-second deterministic check

That's it. From that point on, hooks fire automatically on every edit, every commit attempt, and every turn-end. If the orchestrator tries to declare "done" with a stub in the diff, the Stop hook returns:

{"decision": "block",
 "reason": "DoD-Guard: cannot end the turn — Definition of Done is unmet.
            Run /dod:verify to see the full list of blocking issues."}

The commands

Command	Purpose	Time
`/dod:init`	Bootstrap config and DoD checklist for the project	< 5 s
`/dod:verify`	Fast 5-phase deterministic check; refreshes the marker	~ 30 s
`/dod:audit`	Full multi-agent audit (7 subagents in parallel)	2-3 min
`/dod:report`	Read existing reports, format markdown — no LLM	< 1 s
`/dod:stubs`	Fastest scan: stubs and TODOs only	~ 2 s
`/dod:tests`	Audit only test quality (post-TDD)	~ 30 s
`/dod:checklist`	Show DoD with this session's auto-verified items	< 5 s
`/dod:confess`	Force a 7-section paranoid self-audit	~ 10 s

What gets caught

Cross-language (Python / JS / TS / Go / Rust / Ruby / Bash)

Pattern	Detector
`pass` · `...` · `return None` · `{}` as function body	`detect-empty-functions.py` (AST for Python)
`TODO` · `FIXME` · `XXX` · `HACK` markers	`detect-stubs.sh` + `detect-todos.sh --diff`
`NotImplementedError` · `todo!()` · `unimplemented!()` · `panic("not implemented")`	`check-not-implemented.sh`
Action-named fns returning only `null` / `{}` / `[]`	`detect-suspicious-returns.py`

JavaScript / TypeScript-specific

Pattern	Detector
`expect(x).toBeDefined()` / `.not.toBeNull()` / `.toBeTruthy()` on a literal	`detect-test-tautology.py`
`expect(mock).toHaveBeenCalled()` with no matching `.toHaveBeenCalledWith(...)`	`detect-test-tautology.py`
`expect.assertions(0)` · `expect({}).toMatchSnapshot()`	`detect-test-tautology.py`
`assert.ok(true)` · `.to.be.ok` · `.to.exist` (Node / chai weak)	`detect-test-tautology.py`
`test.skip` / `xit` / `xdescribe` added in this diff	`detect-test-tautology.py`

Go-specific

Pattern	Detector
`func NewX() *X { return &X{} }` (constructor with no fields set)	`detect-suspicious-returns.py`
`_ = err` · `_, _ = ...` (error-swallow)	`detect-suspicious-returns.py`
`assert.True(t, true)` · `assert.Equal(t, x, x)` · `assert.NoError(t, nil)`	`detect-test-tautology.py`
`TestX(t *testing.T)` body with no assertion-like call	`detect-test-tautology.py`
`t.Skip(...)` · `t.Log("TODO...")`	`detect-test-tautology.py`
`// nolint:` added in the diff	`detect-stubs.sh` · `detect-todos.sh`

Multi-agent verification

Audit	Subagent
Bugs, edge cases, security, race conditions	`adversarial-reviewer`
Test quality (tautologies, mocks-only, decorative asserts)	`test-quality-auditor`
End-to-end behavior proof (curl, CLI, real probe)	`e2e-verifier`
Regressions vs. the last baseline	`regression-hunter`
Every claim in the completion report cross-checked	`claim-validator`
Stubs, TODOs, completeness	`completeness-auditor`
Final verdict aggregation (one FAIL = FAIL)	`final-judge`

A concrete scenario

"Add a /refund endpoint. We'll wire up the payment gateway later."

Without DoD-Guard: the orchestrator writes a return null stub, adds a test that mocks the gateway and asserts null, declares done. The PR ships broken.

With DoD-Guard:

The orchestrator writes the stub.
PostToolUse hook fires detect-stubs.sh. Returns count: 2 (TODO marker + suspicious return).
Hook emits {"decision": "block", "reason": "DoD-Guard: 2 issue(s) detected..."}.
The orchestrator cannot continue without either implementing the gateway or returning 501 Not Implemented with a test for that response.
Before declaring done, /dod:confess forces a 7-section honest report. claim-validator cross-checks each claim against the diff.
Stop hook re-runs verification. PASS only when zero blocking issues remain.

See docs/EXAMPLES.md for four full scenarios.

Customization

Every detector and every hook is tunable per-project via .dod-guard.json:

{
  "strictness": "normal",                          // strict | normal | lenient
  "detectors": {
    "stubs":          { "enabled": true, "severity": "block" },
    "test_tautology": { "enabled": true, "severity": "block" }
  },
  "hooks": {
    "post_edit":  { "severity": "block" },
    "pre_commit": { "require_verify_recent": true, "verify_ttl_seconds": 600 },
    "stop_gate":  { "skip_tests": false }
  },
  "audit": {
    "parallel": true,
    "subagents": ["completeness-auditor", "test-quality-auditor", "regression-hunter"]
  },
  "exemptions": {
    "paths": ["**/migrations/**", "vendor/**", "src/generated/**"]
  }
}

Highlights:

Three strictness levels, per-detector severity overrides.
Custom regex patterns (e.g., a company-specific @INTERNAL_TODO).
Glob-based exemptions (with DODG_NO_EXEMPTIONS=1 bypass for tests).
Custom detectors via local scripts/local/detect-*.py.
Per-stack DoD templates auto-selected by /dod:init (Node, Go, generic).

Full reference: docs/CUSTOMIZATION.md.

Project layout

.claude-plugin/
  plugin.json                    manifest
  marketplace.json               single-plugin marketplace entry
hooks/
  hooks.json                     event wiring
  handlers/                      5 hook scripts
scripts/
  detect-*.{sh,py}               7 detectors
  run-full-suite.sh              test runner auto-detection
  run-verification-pipeline.sh   aggregator → JSON verdict
  lib/                           shared helpers (exemptions, language detection)
agents/                          7 adversarial subagents
commands/                        8 slash commands
skills/*/SKILL.md                4 behavior-shaping skills
templates/                       .dod-guard.json + DOD.md per stack
docs/                            ARCHITECTURE · CUSTOMIZATION · EXAMPLES · DEVELOPMENT
tests/
  test-*.sh                      4 test suites (94 assertions)
  fixtures/                      negative + clean test projects

Verification

The plugin verifies itself. A 94-assertion test suite covers every detector, every hook, and every adversarial agent. Run any of them with one command:

bash tests/test-detectors.sh        # 28 / 28 — all detectors against fixtures
bash tests/test-hooks.sh            # 18 / 18 — payload simulation for the 5 hooks
bash tests/test-agents-syntax.sh    # 36 / 36 — agent + command frontmatter
bash tests/test-integration.sh      # 12 / 12 — end-to-end init → block → fix → pass

shellcheck -x is clean on every shell script. python3 -m py_compile is clean on every Python script. claude plugin validate passes for both plugin.json and marketplace.json.

The plugin's own source is held to its own rules: bash scripts/run-verification-pipeline.sh --skip-tests returns VERDICT: PASS, 0 issues.

Contributing

PRs welcome. The short version:

git clone https://github.com/atoslins/dod-guard
cd dod-guard
bash tests/test-detectors.sh
bash tests/test-hooks.sh
bash tests/test-agents-syntax.sh
bash tests/test-integration.sh
shellcheck -x hooks/handlers/*.sh scripts/*.sh tests/*.sh

The contribution guide is in docs/DEVELOPMENT.md. Style, testing, and release process are documented there.

Open issues and discussions are tracked at github.com/atoslins/dod-guard/issues.

FAQ

Can the orchestrator just ignore the hooks? No. Hooks are executed by Claude Code itself before and after every tool call. They return JSON the agent must obey ({"decision": "block"} halts the turn). The agent literally cannot proceed.

Doesn't this slow everything down? The detectors are bash + Python AST — under 100 ms on small projects, under a second on large ones. The full /dod:audit (7 subagents in parallel) takes 2-3 minutes and is meant for end-of-task, not every turn.

What happens if a hook itself has a bug? The Stop hook honors stop_hook_active: true from the payload — if the hook keeps blocking, Claude Code routes the agent back to the user after one cycle. No infinite loops. Other hooks no-op silently when .dod-guard.json is absent.

Does this replace code review? No. It catches the category of failure that LLM agents disproportionately produce (premature completion, decorative tests, swallowed errors). Human review still catches design issues, architectural drift, and product-fit problems. Use both.

Can I disable a specific detector? Yes, in .dod-guard.json. But prefer narrowing patterns over disabling — the detector is cheap, the value of catching one real bug is high.

Acknowledgments

DoD-Guard borrows ideas from:

The Claude Code plugin and hook ecosystem.
The adversarial-review pattern of independent skeptic subagents.
The Test-Driven Development / Definition-of-Done discipline from agile and lean engineering.

The thread that ties them together is a single principle: evidence before assertion, always.

License

_{Built with — and for — Claude Code.}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DoD-Guard

Definition of Done as an executable barrier for Claude Code

Why this exists

How it works

Install

First-run

The commands

What gets caught

Cross-language (Python / JS / TS / Go / Rust / Ruby / Bash)

JavaScript / TypeScript-specific

Go-specific

Multi-agent verification

A concrete scenario

Customization

Project layout

Verification

Contributing

FAQ

Acknowledgments

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.claude-plugin		.claude-plugin
.github		.github
agents		agents
assets		assets
commands		commands
docs		docs
hooks		hooks
scripts		scripts
skills		skills
templates		templates
tests		tests
.dod-guard.json		.dod-guard.json
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
DOD.md		DOD.md
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

DoD-Guard

Definition of Done as an executable barrier for Claude Code

Why this exists

How it works

Install

First-run

The commands

What gets caught

Cross-language (Python / JS / TS / Go / Rust / Ruby / Bash)

JavaScript / TypeScript-specific

Go-specific

Multi-agent verification

A concrete scenario

Customization

Project layout

Verification

Contributing

FAQ

Acknowledgments

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages