feat(audit): agentic cert-lifecycle audit trail — attribution + tamper-evidence by fabriziosalmi · Pull Request #313 · fabriziosalmi/certmate

fabriziosalmi · 2026-06-15T11:37:18Z

Closes the gap raised on the v2.15.0 LinkedIn thread: when an MCP/AI agent renews or replaces certificates on a schedule, "it ran" was not an audit trail. CertMate could not say what changed, when, and on whose authority in a form a third party could check.

This is the non-risky scope of that work — attribution + tamper-evidence + the operator surface to use them. It deliberately stops short of signing/anchoring (see Out of scope).

What the analysis found

The starting state was worse than assumed: the success paths emitted no audit record at all — successful create / renew / deploy / auto-renew, and every scheduled (unattended) renewal, were invisible; only denials were logged. The convenience helpers log_certificate_created/renewed/... were dead code. And the log had no integrity (a line could be edited/deleted/reordered with no trace).

What this adds

Attribution (74149cb)

Every audit entry gains an additive, structured actor {kind, id, label, token_prefix, agent_session} and trigger {cause, job_id}. Old call sites and readers keep working (a system actor is synthesised when none is passed).
actor.kind is derived only from the authenticated identity: an is_agent-flagged scoped key → agent; a non-agent key / legacy bearer → api_token; a session → user; the scheduler → scheduler. The client-supplied X-CertMate-Agent-Session header is recorded as an informational claim and can never promote a caller to agent.
Emission now fires on the previously-silent paths: cert_service.issue_create/renew/reissue (covering API sync, async executor, and web), the scheduled renewals (actor.kind='scheduler' + job_id), and the auto-renew / manual-deploy endpoints.
The MCP server sends a per-process agent session (override CERTMATE_AGENT_SESSION).

Tamper-evidence (17c77b7)

An append-only SHA-256 hash chain at data/audit/certificate_audit.chain.jsonl: {seq, entry, prev_hash, hash}, gap-free seq (a missing seq proves a deletion), byte-stable canon (IDN-safe). Single-writer under a lock, fsync per line, state advanced only after a durable write, recovered on restart, truncated-tail tolerated. No new dependencies (json + hashlib).
A standalone verifier (python -m modules.core.audit_verify, stdlib only, runs without CertMate) reports PASS or the exact seq + reason of the first break.

Operator surface (e2bd4a1)

GET /api/audit/verify (admin, read-only): runs the verifier, returns 200 intact / 409 broken.
Settings → API Keys: an "AI agent key" checkbox (+ an agent badge in the list) that sets is_agent, so the MCP server can use a key whose actions are attributed as agent.

Docs (e1e2307)

api.md: corrects the audit-log section (it wrongly claimed pure JSON, hiding the logging prefix and the local-vs-UTC time bases) and documents the new fields, the chain, the endpoint, and the verifier.
mcp.md: how to give an agent a dedicated is_agent key.
compliance.md (new): an honest operator-enablement mapping to NIS2 (strongest fit), EU AI Act Art. 50 (transparency spirit only), and ISO 42001 (records), with explicit non-claims.

Hardening from adversarial review (12da43e)

An independent review found a blocker: a non-object JSON line in the chain file could raise AttributeError out of AuditLogger.__init__ and abort app startup (taking the scheduler down). Fixed so a corrupt line is reported, never crashes; recovery can never abort __init__. Plus lock-on-verify and the tail-truncation limitation documented + tested.

Threat-model honesty (stated in code and docs)

The chain detects interior modification / deletion / reorder by anyone without the writer's running state. It does not detect tail truncation without an external head anchor, and it does not bind the operator (who holds the file and could rewrite the whole chain). Both require external anchoring of signed checkpoints — Phase 3, deliberately not in this PR.

Out of scope (by design)

Ed25519 signed export bundle + external anchoring (Phase 3) — introduces signing-key lifecycle (loss/rotation) and touches SMTP/S3; a separate decision.
Including the chain in the unified backup zip — touches the backup contract.

Tests / safety

+60 tests (attribution, kind derivation, header-is-a-claim, success/failure emission across create/renew, scheduler attribution, chain write/verify, modify/delete/reorder detection, restart recovery, non-object-line safety, the verify endpoint). Full suite 1530 passed, 17 skipped. No emoji. Template theme-token gate passes.
Audit emission is isolated best-effort everywhere (try/except), so it cannot break or alter certificate issuance/renewal.

Note for before merge: Phase 1 wraps the issuance choke point (issue_create/renew/reissue) additively, and the renewal tests pass — but since that path was touched, a real-cert smoke (issue+renew against a test subdomain) is the prudent final check.

🤖 Generated with Claude Code

…mit on the silent paths When an MCP/AI agent renews or replaces certificates on a schedule, the audit log could not say what changed, when, and on whose authority. Two gaps caused this: the success paths that matter emitted no audit record at all, and the records that did exist carried only a coarse `user` string. Attribution (Phase 1 of l0 #408): - audit entries gain an additive, structured `actor` ({kind, id, label, token_prefix, agent_session}) and `trigger` ({cause, job_id}). log_operation synthesises a {kind:'system'} actor when none is passed, so every existing call site and old reader keeps working. - new audit_context resolver maps the AUTHENTICATED identity into (actor, trigger): a scoped key flagged is_agent -> kind='agent', a non-agent scoped key / legacy bearer -> 'api_token', a session -> 'user'. The client-supplied X-CertMate-Agent-Session / -Agent-Id headers are recorded as an informational claim only and never promote a caller to 'agent'. - auth threads the stable api_key_id + token_prefix + is_agent into request.current_user; create_api_key accepts and persists is_agent (exposed on POST /api/keys) so an operator can dedicate an agent-flagged key. Emission on the previously-silent paths: - cert_service issue_create / issue_renew / issue_reissue now emit an attributed success or failure record at the choke point all of API, async executor, and web routes funnel through (the async path captures the context synchronously). - scheduled renewals (CertificateManager + ClientCertificateManager check_renewals) emit per-domain records with actor.kind='scheduler' and trigger.job_id — the headline unattended-agent case, previously invisible. - the auto-renew toggle and manual deploy endpoints emit attributed records. - the MCP server sends a per-process X-CertMate-Agent-Session (override via CERTMATE_AGENT_SESSION) so an agent session's actions can be grouped. 16 new tests cover kind derivation, the header-is-a-claim rule, success/failure emission across create/renew, and scheduler attribution. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The audit log was a plain append-only text file with no integrity: a single line could be edited, deleted, or reordered and nothing would detect it (the readers even skip malformed lines silently, so a deletion left no trace). This adds a parallel, append-only certificate_audit.chain.jsonl that records every audit entry inside a SHA-256 hash chain: - each line is canonical JSON of {seq, entry, prev_hash, hash}, where hash commits to seq + entry + prev_hash, and prev_hash links to the previous line's hash. seq is a gap-free monotonic counter, so a missing seq proves a deletion. canon uses sorted keys / no whitespace / UTF-8 with non-ASCII preserved, so an IDN domain hashes identically on writer and verifier. - the writer keeps single-writer next-seq/last-hash state under a lock (Flask request threads and the APScheduler renewal thread share one AuditLogger), fsyncs each line, and only advances state after a durable write — a chain failure never breaks audit logging or the audited operation, and never fabricates a phantom gap. State is recovered from the last complete line on restart; a truncated trailing line (interrupted write) is tolerated. - the chain is written under the persistent data/ tree (data/audit), not the ephemeral logs/ tree, so it is the durable verifiable artifact. Disable with CERTMATE_AUDIT_CHAIN=0. A standalone verifier (python -m modules.core.audit_verify [chain.jsonl], stdlib only, no CertMate import needed) recomputes the chain and reports PASS or the exact seq and reason of the first break (modification / deletion / reorder / truncation). Exit 0 intact, 1 broken, 2 missing/IO. Honest threat model, stated in the module: the chain detects tampering by anyone WITHOUT the writer's running state, but does not bind the operator, who holds the file and can recompute the whole chain. Constraining the operator needs external anchoring of signed checkpoints (Phase 3), deliberately not implemented here. No new dependencies (json + hashlib). 18 new tests: canon determinism incl. non-ASCII, write+verify, modify/delete/ reorder detection, restart recovery, truncated-tail tolerance, kill switch, separate chain dir, and the CLI exit codes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…I toggle Make the Phase 1/2 audit trail usable without the shell: - GET /api/audit/verify (admin, read-only) runs the hash-chain verifier and returns its result, 200 when intact and 409 when broken, so an operator or a monitoring probe can confirm integrity (or alert) over the API. - Settings -> API Keys gains an "AI agent key" checkbox (and the list shows an "agent" badge). It sets is_agent on the key so the agent's actions are attributed as actor.kind='agent' — previously settable only via the API. 3 endpoint tests (intact -> 200, tamper -> 409, no audit -> 503). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…compliance - api.md: correct the audit-log section (each line is a logging-prefixed JSON message split on " - INFO - ", not pure JSON; local vs UTC time bases), and document the actor/trigger fields, the tamper-evident chain, GET /api/audit/verify, and the standalone verifier CLI. - mcp.md: how to give an agent a dedicated is_agent-flagged key so its actions are attributed as actor.kind='agent', plus the CERTMATE_AGENT_SESSION / CERTMATE_AGENT_ID env vars. - compliance.md (new): an honest operator-enablement mapping to NIS2 (strongest fit), EU AI Act Art. 50 (transparency spirit only), and ISO 42001 (records), with explicit non-claims and the threat-model limits (the local chain does not bind the operator; off-box anchoring is not implemented). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…lock-on-verify, tail-truncation docs An adversarial review of the branch found one blocker and two refinements: - BLOCKER: verify_chain and _recover_chain_state assumed every parsed JSON line was an object. A valid-JSON-but-non-object line ([1,2,3], 42, null) raised AttributeError. Because _recover_chain_state runs in AuditLogger.__init__ (called unguarded by the factory) under an except that only caught OSError, a single malformed line in the chain file could abort app startup — taking the renewal scheduler down with it. Now: verify_chain reports a non-object line as malformed (never raises), and recovery skips non-object lines and catches any exception, disabling the chain rather than ever aborting __init__. - In-process AuditLogger.verify_chain() now takes the append lock so a verify racing an in-flight append cannot observe a half-written final line and report a spurious truncation (the standalone CLI verifier still runs lock-free by design). - Documented the inherent limitation that tail truncation (removing entries from the end) is not detected without an external head anchor (Phase 3), in api.md, compliance.md, and a pinning regression test. 7 new tests: 5 non-object-line cases, recovery surviving a non-object line, and the tail-truncation limitation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The 'AI agent key' badge introduced teal utility classes (bg-teal-100/text-teal-800 + dark variants) not previously present in the purged bundle. Rebuild static/css/tailwind.min.css so the frontend-css CI gate (npm run css:build + git diff --exit-code) stays green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Minor: new capability and API surface (not a bugfix). - Attribution (actor/trigger) on every certificate-lifecycle audit entry, and emission on the previously-silent success and scheduled-renewal paths. - Tamper-evident SHA-256 hash chain + standalone verifier. - New GET /api/audit/verify (admin) and an "AI agent key" (is_agent) toggle. - Docs: audit/attribution, the verifier, and an honest compliance mapping. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…-text FP) CodeQL flagged `_scrub_log(domain)` in the _audit_emit failure-path debug log as "clear-text logging of sensitive information (password)". It is a false positive — the value is a certificate domain name, already scrubbed — caused by field-insensitive taint on the `prepared` dict (which holds both the domain and the attribution context incl. token_prefix). The domain added no diagnostic value to that line, so drop it: the operation type is enough, and the scanner stays green without a manual dismissal. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

fabriziosalmi and others added 6 commits June 15, 2026 13:02

github-advanced-security AI found potential problems Jun 15, 2026

View reviewed changes

Comment thread modules/core/cert_service.py Fixed

fabriziosalmi and others added 2 commits June 15, 2026 13:57

fabriziosalmi merged commit 9763077 into main Jun 15, 2026
6 of 7 checks passed

fabriziosalmi deleted the feat/agentic-audit-trail branch June 15, 2026 12:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(audit): agentic cert-lifecycle audit trail — attribution + tamper-evidence#313

feat(audit): agentic cert-lifecycle audit trail — attribution + tamper-evidence#313
fabriziosalmi merged 8 commits into
mainfrom
feat/agentic-audit-trail

fabriziosalmi commented Jun 15, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

fabriziosalmi commented Jun 15, 2026

What the analysis found

What this adds

Threat-model honesty (stated in code and docs)

Out of scope (by design)

Tests / safety

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants