Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 87 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# Agent Instructions — AICertify

This file is the canonical operational guide for AI coding agents working in this repository (Claude Code, Cursor, Codex, Windsurf, Gemini CLI, Copilot, etc.). Tool-specific files (`CLAUDE.md`, `GEMINI.md`) inherit from this and add only platform-specific notes.

## What this project is

**AICertify** is a Python framework that evaluates AI applications against regulatory frameworks using [Open Policy Agent (OPA)](https://www.openpolicyagent.org/) policies sourced from the sister project [gopal](https://github.com/Principled-Evolution/gopal). It produces audit-ready compliance reports in PDF, Markdown, JSON, and HTML.

The user-facing surface is:

1. **Python API** — `aicertify.application`, `aicertify.regulations`, evaluator classes
2. **CLI** — `python -m aicertify.cli`
3. **Reports** — generated artifacts in `examples/outputs/` you can show to an auditor

## Repository layout

```
aicertify/ Python package
├── __init__.py Public API (re-exports the surface a user sees)
├── cli.py Argparse CLI entry — see "Useful commands" below
├── application.py / regulations.py User-facing fluent API
├── api.py / contract_models.py Contract data model (Pydantic)
├── evaluators/ Pluggable evaluators (Fairness, ContentSafety, …)
├── opa_policies/ Vendored Rego policy tree (mirrors gopal layout)
│ ├── global/v1/ Cross-cutting categories
│ ├── international/ EU AI Act, NIST, India, …
│ ├── industry_specific/ Healthcare, BFS, Automotive, Aviation, …
│ └── operational/ AIOps, cost, corporate
├── report_generation/ ReportLab-based PDF + Markdown + HTML + JSON writers
└── assets/ Logos, images
examples/
├── quickstart.py End-to-end demo — the canonical "does this work?" script
├── sample_contract.json Contract structure reference
└── outputs/ Pre-generated reports for inspection
tests/ pytest tests
pyproject.toml Poetry-managed project metadata
```

## Useful commands

```bash
# Install (editable)
pip install -e .

# Run the end-to-end demo (writes reports/ into the repo)
python examples/quickstart.py

# CLI evaluation
python -m aicertify.cli \
--contract examples/sample_contract.json \
--policy aicertify/opa_policies/international/eu_ai_act/v1 \
--report-format pdf \
--output-dir reports/

# Tests
pytest tests/ -v

# Lint / format
ruff check aicertify/
black aicertify/
```

## Conventions

- **Python 3.12 only** (see `requires-python` in `pyproject.toml`). Do not introduce syntax incompatible with 3.12 — do not assume 3.13.
- **Line length 88** (black + ruff default). Don't change this.
- **Pydantic v2** — when defining models, use `model_config` not `class Config`.
- **OPA policies** — never hand-edit `aicertify/opa_policies/*.rego` files in this repo without considering whether the change should land upstream in [gopal](https://github.com/Principled-Evolution/gopal) first. They are a vendored copy.
- **Reports are user-facing legal artifacts** — when changing report generation, run the quickstart and inspect the output PDF before claiming success.
- **Avoid breaking the public API** — anything re-exported from `aicertify/__init__.py` is part of the user-facing contract. Bump the version and note in CHANGELOG.md if you must.

## What NOT to do

- Don't pin new dependencies aggressively — many users will install AICertify alongside their own stack; tight pins create conflicts.
- Don't replace OPA with another policy engine. The whole value proposition is policy-as-code in Rego.
- Don't commit `reports/` or `examples/outputs/` content unless deliberately updating the canonical samples.
- Don't introduce calls to closed-source SaaS services (Credo AI, etc.) — this is a self-hostable, on-prem-capable framework.
- Don't add a feature that requires `python >= 3.13` — the project is intentionally pinned to 3.12.

## Sister projects

- **[gopal](https://github.com/Principled-Evolution/gopal)** — the upstream Rego policy library. If you find yourself adding a new framework, contribute the `.rego` there, then vendor it here.
- **[Open Policy Agent](https://www.openpolicyagent.org/)** — the engine.

## Conservatism

The author prefers **surgical changes**: do only what was asked, present the plan first when there's any ambiguity, and ask before introducing new abstractions. Critique your own design once for elegance, DRY, KISS, and explainability before presenting it.
43 changes: 43 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Changelog

All notable changes to **AICertify** are documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

### Added
- Centered HTML hero, ordered badge wall, value-prop tagline, and 5 programmatically-generated marketing diagrams in the README (regulatory coverage, architecture, comparison vs Fairlearn / AIF360 / MS RAI / Credo AI, report anatomy, end-to-end flow).
- `AGENTS.md` and `CLAUDE.md` — operational instructions for AI coding agents working in this repository.
- `skills/` directory with 4 Claude Code skills: `run-compliance-check`, `evaluate-contract`, `explain-regulation`, `draft-policy`. Each ships as a slash command once installed into `~/.claude/skills/`.
- Comparison table vs Fairlearn / IBM AI Fairness 360 / Microsoft RAI Toolbox / Credo AI in the README.
- `diagrams/generate_diagrams.py` — reproducible matplotlib script that regenerates every marketing PNG.

### Changed
- README rewritten for product-page clarity: value-prop first, then quickstart, then differentiation, then coverage.

## [0.7.0] — 2025-04

### Added
- Reporting subsystem (`aicertify.report_generation`) producing audit-ready artifacts in PDF (via ReportLab), Markdown, JSON, and HTML.
- Quickstart example (`examples/quickstart.py`) wiring sample interactions through the EU AI Act policy set and emitting a full report.
- Pluggable evaluator classes — `FairnessEvaluator`, `ContentSafetyEvaluator`, `RiskManagementEvaluator`, `ComplianceEvaluator`.
- Sample pre-generated reports under `examples/outputs/` for EU AI Act, loan evaluation, and medical diagnosis use cases.

### Changed
- OPA policies migrated to the standalone [gopal](https://github.com/Principled-Evolution/gopal) library; AICertify now vendors the policy tree under `aicertify/opa_policies/` via Git submodule.
- Enhanced logging across the evaluation pipeline.
- Pre-commit hooks: `ruff`, `black`, security checks.

### Fixed
- Security: bumped `protobuf` to 5.29.5 and `pycares` to 4.9.0 to resolve advisory exposure.
- Security: bumped `transformers` and `setuptools` to resolve security alerts.
- Auto-labeling workflow no longer produces excessive labels.

## Earlier history

For changes prior to 0.7.0, see the [Git log](https://github.com/Principled-Evolution/aicertify/commits/main).

[Unreleased]: https://github.com/Principled-Evolution/aicertify/compare/v0.7.0...HEAD
[0.7.0]: https://github.com/Principled-Evolution/aicertify/releases/tag/v0.7.0
51 changes: 51 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Claude Code — Project Context for AICertify

> **Read [AGENTS.md](AGENTS.md) first.** This file inherits all of those instructions and only adds Claude Code-specific notes.

## What you're working on

**AICertify** evaluates AI applications against regulatory frameworks (EU AI Act, NIST AI RMF, +13 more) using [OPA](https://www.openpolicyagent.org/) policies from [gopal](https://github.com/Principled-Evolution/gopal). The deliverable is an audit-ready compliance report (PDF / Markdown / JSON / HTML).

## Fast orientation

When the user asks a question about the codebase, in order:

1. **For "how does the eval flow work?"** → read [examples/quickstart.py](examples/quickstart.py). It's the canonical user path.
2. **For "what's covered by regulation X?"** → look in `aicertify/opa_policies/<domain>/<framework>/v1/`.
3. **For "what does a report look like?"** → open one in `examples/outputs/`.
4. **For "what's the public Python API?"** → read `aicertify/__init__.py` — it's the surface contract.

## Conventions Claude Code should respect

- **Use TodoWrite** for multi-step tasks (this is the user's preferred tracking).
- **Run the quickstart** before claiming a feature works end-to-end. Type-check and tests verify code correctness; the quickstart verifies *user-facing* correctness.
- **When editing OPA policies in `aicertify/opa_policies/`**, remind the user that these are vendored from [gopal](https://github.com/Principled-Evolution/gopal) and the upstream copy should usually be edited first.
- **Reports** — when changing report templates, open the generated PDF (`examples/outputs/eu_ai_act/`) and inspect it visually. Don't assume rendering correctness from the code alone.

## Useful skills

This project ships Claude Code skills in [`skills/`](skills/). To register them, copy into your Claude Code skills directory:

```bash
mkdir -p ~/.claude/skills && cp -r skills/* ~/.claude/skills/
```

Available slash commands once registered:

- `/run-compliance-check` — Run the end-to-end quickstart and surface the generated report
- `/evaluate-contract` — Evaluate a user-supplied contract JSON against any supported framework
- `/explain-regulation` — Walk every policy in a framework directory and explain what it checks
- `/draft-policy` — Scaffold a new Rego policy with metadata, default rule, and a test sibling

See [`skills/README.md`](skills/README.md) for the full list and conventions.

## What NOT to do in this repo

- Don't run `python examples/quickstart.py` repeatedly without cleaning `reports/` — the directory accumulates artifacts.
- Don't add Python ≥3.13 syntax (project pins to 3.12).
- Don't switch package manager — poetry is canonical, `pip install -e .` works for users.
- Don't claim MCP-compatibility yet — an MCP server is on the roadmap, not shipped.

## Session etiquette

The author prefers terse, surgical responses. Present the plan first when ambiguous; ask before introducing new abstractions.
Loading
Loading