ClaimBound Evidence

ClaimBound turns a narrow public AI, ML, data or software-development claim into a small evidence card: a checkable record with the protocol, source boundary, hashes, exact result status, claim boundary and reproduction level.

It is not a model leaderboard, production forecasting service or certification authority. It is an open-source toolkit for asking one plain question:

Where is the evidence?

If there is no evidence card, the statement is still only a claim.

What A Card Shows

An evidence card keeps the useful claim small enough to inspect:

Card field	Plain meaning
Claim	The exact public statement being checked.
Source	The public source or source documentation used for the check.
Protocol	The rules fixed before the result was accepted.
Status	Passed, negative, blocked, insufficient or reproduced.
Boundary	What the card proves and what it must not be used to claim.
Reproduction	Whether another run reproduced the outcome, and with what limits.

Raw payloads, prompt text, transcripts and restricted source files stay outside the public repository unless redistribution is clearly allowed. The public record stores hashes, summaries and links so a local operator or organization can keep private evidence reproducible without publishing sensitive material.

Choosing Protocol V2 And V3

The evidence card is the result. Protocol v2 and v3 are optional guardrails around the result, not stronger evidence by themselves.

Need	Use	Read next
Publish one completed narrow result.	Evidence card JSON/SVG and sanitized report.	Evidence card protocol
Keep related R&D fair across diagnostics, proof tracks, stop rules and closure.	Protocol v2 family/frontier ledgers.	R&D family protocol v2
Give public readers a compact map of iron claims, flow claims, tombstones and blocked branches.	Protocol v3 tree overlay.	Protocol v3 tree overlay
Decide which stack an audience should use.	Smallest honest stack: card only, card + v2, or card + v2 + v3.	Protocol use by layer and audience

Practical rule: use the smallest layer stack that prevents overclaiming. Do not add v2 or v3 to make a weak result look stronger.

Example: AI System-Card Claim

Public claim:

Anthropic publishes a public system-card index for its AI models.

ClaimBound narrows it:

Can the official Anthropic system-card page be source-audited by URL, access
date, content type, expected markers and SHA-256 without making any model
safety, model quality or runtime-behavior claim?

Current card status:

PASSED_UNDER_PROTOCOL / GREEN_VALIDATED

What this proves: the public source boundary passed the documented source-audit gate at access time.

What it does not prove: that Claude or any Anthropic runtime is safer, better, unchanged, deployment-ready or benchmark-superior.

Read the JSON or open the visual SVG card.

Example Cards

These are deliberately different outcomes: green means a narrow claim passed, yellow means reproduction is useful but limited, amber means the source boundary blocked a fair result, and red means the protocol ran but the claim did not pass.

Example	Status	What the card proves	Links
Anthropic system-card source audit	`PASSED_UNDER_PROTOCOL`	The official system-card index passed a narrow public-document source audit.	JSON / SVG
EEA AQ manual track	`BLOCKED_SOURCE`	The larger PM10 manual track could not fairly run from an incomplete public URL manifest.	JSON / SVG
NASA POWER D-103	`PASSED_UNDER_PROTOCOL` with `REPRODUCED_OUTCOME_WITH_SOURCE_BYTE_DRIFT`	The frozen gate-level outcome reproduced, but fresh source bytes differed.	JSON / SVG
NOAA CO-OPS D-131	`NEGATIVE_RESULT_UNDER_PROTOCOL`	The official-source run completed and honestly did not pass the frozen gate.	JSON / SVG

Twelve Public Use Categories

The public examples are easier to understand by audience. Every row below says who the evidence discipline helps, what kind of claim it checks, what has been proven so far, and what to do next.

Audience / category	Typical task	Current examples	What we proved	Status and next step
Public AI transparency readers	Check whether AI vendors publish inspectable public documentation.	Anthropic system cards, OpenAI GPT-5 system-card PDF, Google DeepMind model cards, xAI Grok prompts.	Official public pages or repositories were reachable and hashed under a source-audit boundary.	Green source-audit cards exist. Next: independent reruns and narrower runtime-equivalence requests where sources allow it.
AI and LLM evaluation teams	Check whether a benchmark or model claim has model ID, prompt set, scoring rule and transcript hashes.	`MODEL_EVAL_D001`.	The current source did not provide enough material for a fair public evidence result.	`BLOCKED_SOURCE`. Next: provide frozen prompts, model/API metadata, transcript hashes and scoring code.
AI risk, security and automation-control teams	Turn broad AI-control rules into bounded claims for agent tool use, prompt-injection checks, security-sensitive code, robotics scenarios, vehicle-software release gates or incident evidence.	AI risk control with ClaimBound, protocol v2/v3 planning layers.	Guidance exists for using ClaimBound as an evidence-bound control layer without claiming certification, hacker-proofing, physical runtime control or complete risk removal.	Next: publish a completed narrow card for one AI-agent, security-scan, robotics-scenario or release-gate claim.
Software developers and maintainers	Add a reviewable evidence trail for risky, public, regression-sensitive or AI-assisted software changes.	Software development workflow, protocol v2/v3 planning layers.	ClaimBound can document fixed commands, fixtures, sanitized logs, hashes and limitation boundaries without replacing tests, CI or code review.	Guidance exists. Next: publish a completed narrow software evidence card for one build, API, parity or regression claim.
Companies with AI products	Turn a product claim into a customer-readable evidence card.	`AI_PRODUCT_CLAIM_D001`.	The public product announcement was not enough to support an empirical pass/fail claim.	`BLOCKED_SOURCE`. Next: publish exact claim, model/source docs, prompt or transcript manifest and limitations.
Independent verifiers and public buyers	Decide what is independently checkable before adopting an AI system.	`PROCUREMENT_AI_D001`.	Procurement evidence needs source, scoring and model metadata before it can become decision support.	`BLOCKED_SOURCE`. Next: run a vendor-claim protocol with frozen sources and stop rules.
Data stewards and public-data teams	Verify official source pages, rights notes and raw-payload policy before analysis.	EEA Air Quality source audit, EEA AQ manual track.	EEA passed a narrow download-page source audit, but the larger PM10 manual track blocked because the API URL-list manifest was incomplete for BE/NL.	EEA source audit is green; EEA manual track is a blocked-source card. Next: complete raw-payload reruns only with a full external manifest.
Civic tech, journalism and watchdogs	Check claims about mobility, infrastructure, climate or public services against official data.	NYC TLC Phase 4 artifact, `CIVIC_CLAIM_D001`.	Current civic examples show why official source access and frozen gates matter before public claims.	Blocked or artifact-only. Next: add a full evidence card or keep the artifact clearly marked as non-card evidence.
Open science and reproducibility teams	Reproduce a published result and keep negative or drift outcomes citable.	NASA POWER D-103, `REPRO_APPENDIX_D001`.	NASA reproduced the gate-level outcome with source-byte drift; the reproduction appendix scaffold still needs stronger source linkage.	NASA is yellow-limited reproduction. Next: add independent rerun records.
ML researchers	Separate a narrow method result from broad model-superiority language.	`ML_APPENDIX_D001`.	The current appendix scaffold shows required controls, baselines and claim boundary, but no completed empirical result.	`BLOCKED_SOURCE`. Next: run with frozen controls and publish exact pass/negative/blocked status.
Educators	Teach reproducible ML discipline with small public examples.	`EDU_REPRO_D001`.	The classroom track is ready as a scaffold, not as a completed evidence claim.	`BLOCKED_SOURCE`. Next: complete a student-friendly run and publish limitations.
Funding reviewers and program evaluators	Read what was promised, which source was used, what happened and what cannot be claimed.	`FUNDING_REVIEW_D001`.	A funding appendix needs protocol, source, status and limitations instead of a narrative success claim.	`BLOCKED_SOURCE`. Next: attach validated cards to reports or proposals.

For the full card list, see docs/evidence_cards/README.md. The registry index is docs/registry/evidence_index.json.

Start with ClaimBound in 5 minutes for the plain-language version.

Install

uv sync --extra dev
uv run --extra dev python -m pytest -n auto

Quick Start

Create a draft scaffold:

uv run claimbound new

Create the same scaffold non-interactively:

uv run claimbound new \
  --source-url "https://example.org/source-docs" \
  --protocol-id "EXAMPLE_D001" \
  --domain "public-data" \
  --track-type "source_audit" \
  --execution-mode "MANUAL_NO_AI" \
  --out "docs/manual_audit/EXAMPLE_D001"

Run local demo helpers:

uv run claimbound demo eea-source-audit
uv run claimbound demo grok-source-audit
uv run claimbound validate-all

validate-all checks committed evidence cards, the registry and any optional docs/track_families/*_FAMILY_LEDGER.json, docs/track_families/*_FRONTIER.json or docs/track_families/*_TREE.json files. Historical cards created before the R&D family protocol do not need retroactive ledgers.

Prepare a local-only run root:

uv run claimbound run-root \
  --protocol-id EXAMPLE_D001 \
  --source-url https://example.org/source \
  --operator your-name-or-handle

claimbound new creates a request, protocol draft, playbook, checklist, operator declaration, draft card, R&D family ledger and source-probe summary. It is not evidence. Evidence begins only after an operator freezes the protocol, runs the check, publishes a sanitized report, validates the card and updates the registry.

Next Steps: Simple To Technical

Step	Document	Why read it
1	ClaimBound in 5 minutes	The shortest plain-language explanation.
2	Evidence card examples	Green, yellow, red and blocked examples in one place.
3	Getting started	Installation, local run roots and scaffold commands.
4	Audience and value	Who the project helps, including software developers and AI risk-control teams.
5	Result status protocol v0.1	Exact statuses and the color semantics used by cards.
6	Evidence card protocol v0.1	Required JSON fields and validation rules.
7	Current evidence tracks	What the committed results prove and do not prove.
8	Manual audit protocol v0.1	How to run a no-AI operator track.
9	AI operator protocol v0.1 and AI workflow	What AI may draft, run or summarize, and where human approval is required.
10	AI risk control with ClaimBound	How to use ClaimBound as an evidence-bound AI control layer without claiming certification or complete risk removal.
11	Scaffold workflow protocol v0.1	How requests become protocol, playbook, checklist, family ledger and draft card files.
12	R&D family protocol v2	How related tracks keep claim lists, budgets, diagnostic/proof separation and closure decisions.
13	Protocol use by layer and audience	Which protocol layers to use for each audience and work shape.
14	Protocol layers v2 and v3	How evidence cards, v2 family/frontier ledgers and v3 tree overlays differ.
15	Protocol v3 tree overlay	How iron claims, flow claims, tombstones, badge counts and branch-block rules map related work.
16	Registry direction v0.1 and project next steps	How validated cards become a public registry and what is intentionally out of scope.

Individual pre-registration charters live in docs/protocols/. They are protocol-bound examples, not broad claims.

Manual And AI Tracks

Manual tracks are for human operators who complete checklists and record judgment explicitly. AI-assisted tracks are for cases where an AI agent may draft scaffolds, write deterministic runner code or summarize reports. In both tracks, the final status must come from a protocol, checklist, runner or validator, not from model opinion.

Useful entry points:

Boundary

This repository is independently usable as an open evidence foreground. It does not include, import or require private background technology.

The registry stores validated card metadata and sanitized report references, not raw payloads. Distributed-ledger and chain timestamp features are outside the current roadmap.

For the AI provenance log, use public PRs, commits, releases, checks, evidence cards and registry entries first. GitHub organization audit logs are governance support, not AI provenance by themselves. See AI provenance log and audit logs.

Community

License

Apache-2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.github		.github
artifacts		artifacts
docs		docs
scripts		scripts
specs		specs
src/claimbound_evidence		src/claimbound_evidence
tests		tests
.gitignore		.gitignore
.mailmap		.mailmap
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ClaimBound Evidence

What A Card Shows

Choosing Protocol V2 And V3

Example: AI System-Card Claim

Example Cards

Twelve Public Use Categories

Install

Quick Start

Next Steps: Simple To Technical

Manual And AI Tracks

Boundary

Community

License

About

Uh oh!

Releases 4

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ClaimBound Evidence

What A Card Shows

Choosing Protocol V2 And V3

Example: AI System-Card Claim

Example Cards

Twelve Public Use Categories

Install

Quick Start

Next Steps: Simple To Technical

Manual And AI Tracks

Boundary

Community

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages