Add baseline integrity (M2): provenance, audit log, verify command by pengfei-threemoonslab · Pull Request #89 · ThreeMoonsLab/agents-shipgate

pengfei-threemoonslab · 2026-05-17T01:33:36Z

Summary

Bumps baseline schema 0.4 → 0.5 with an optional self-describing BaselineProvenance (scanner_version / run_id / recorded_at / reason / expires) on every entry; legacy 0.2/0.3/0.4 baselines still load.
Adds an append-only audit log at <baseline-dir>/baseline-audit.log recording every baseline save with SHA-256 hashes before/after plus added/removed fingerprint deltas. The wire shape is documented as stable in STABILITY.md.
Adds manifest.baseline.integrity_mode: off | warn | strict (default warn) and three new check IDs: SHIP-BASELINE-INTEGRITY-MISMATCH (critical), SHIP-BASELINE-ENTRY-EXPIRED (high), SHIP-BASELINE-ENTRY-STALE (low).
Adds agents-shipgate baseline verify [--strict] [--json] [--audit-log] and reserves new exit code 6 (baseline_integrity_failure) for verify --strict; scan continues to use 20 for gate failure regardless of integrity_mode.

Type

What this fixes (why)

Closes the silent-baseline-edit trust hole identified in the M2 plan. Today a developer can hand-edit baseline.json to add a fingerprint and the scanner accepts it — a release_decision.decision = blocked becomes review_required without an audit trail. For a tool that pitches itself as a release gate this is a credibility hole. M2 makes casual / accidental edits observably wrong in code review.

Honest threat model (written into both STABILITY.md and the check rationale): the audit log is tamper-evident, not tamper-proof. A well-resourced adversary who atomically rewrites both baseline.json and the audit log defeats verify. The goal is for git log .agents-shipgate/baseline-audit.log to expose casual edits, not to defeat a determined attacker.

Design notes for reviewers

Integrity findings bypass checks.ignore and checks.severity_overrides. Silencing tamper detection would defeat the property the audit log defends. Severity-tuning is M1's job, not M2's. See src/agents_shipgate/checks/baseline_integrity.py.
Re-save preserves prior provenance. Reviewer-set reason and expires survive subsequent baseline save runs; only newly-added fingerprints get a fresh stamp. Test test_baseline_from_report_preserves_prior_provenance enforces this — losing it would break the audit-log linkage.
integrity_mode: off is the v0.17 escape hatch. Default warn so adopters don't break on upgrade; suggested v0.18 default is strict.
SHIP-BASELINE-ENTRY-STALE splits cleanly: deprecated_check_id is detected from the baseline alone (fires under both verify and scan); resolved_not_pruned is scan-aware (only under scan --baseline X). The verify command intentionally does not implement the scan-aware path.
Pipeline integration runs the integrity findings through assign_finding_ids and annotate_remediation again, so they participate fully in the report (fingerprints, agent_action enum, autofix_safe) but skip dedupe (they were never duplicates) and skip suppressions (intentional).

Verification

CI is authoritative for python -m ruff check ., python -m compileall -q src tests, and python -m pytest.

Additional local checks run:

pytest: 1119 passed, 3 skipped (28 new tests in tests/test_baseline_integrity.py covering audit-log primitives, baseline_from_report provenance behavior across new / resave / legacy paths, every verify_baseline issue kind, the scan-pipeline integration in warn/strict/off modes, and the CLI baseline verify surface).
ruff check .: clean.
python scripts/generate_schemas.py --check: clean (committed docs/checks.json, docs/manifest-v0.1.json, and regenerated llms-full.txt).
Manual: agents-shipgate baseline verify --help shows the new command; ran a full save/verify/tamper/verify cycle on samples/support_refund_agent.

Release-readiness notes

No user-code import added to default scan paths
No network access added to default scan paths
New or changed check IDs are documented in docs/checks.md
Report/schema changes are additive or documented in STABILITY.md

What's NOT in this PR (intentional, queued for follow-up)

Severity floor / acknowledge_overrides — that's M1.
Allowing severity_overrides to tune integrity-finding severities — see "Integrity findings bypass …" above; M1 is the right venue.
Cryptographic signing of the baseline — explicitly out of scope per the threat model.
PR-comment surface for integrity findings in the GitHub Action — Action passes the findings through unchanged via report.json; rendering tweaks deferred.

🤖 Generated with Claude Code

Bumps baseline schema 0.4 → 0.5 and closes the silent-baseline-edit trust hole identified in the M2 plan. Casual / accidental edits to `baseline.json` are now observably wrong in code review. Schema - Adds optional `BaselineProvenance` to each `BaselineFinding` (scanner_version, run_id, recorded_at, reason, expires). Older v0.2/v0.3/v0.4 baselines still load with `provenance=None`; re-running `baseline save` upgrades them. - Re-saves preserve existing provenance for fingerprints already in the baseline. Only newly-added entries get a fresh stamp, so reviewer-set `reason` / `expires` survive subsequent saves. Audit log - New `core/baseline_audit.py`. `baseline save` appends one JSON line to `<baseline-dir>/baseline-audit.log` per save: timestamp, run_id, scanner_version, hash_before, hash_after, added/removed fingerprint deltas. Append-only on the writer side. - Honest threat model: tamper-evident, not tamper-proof. An adversary who rewrites both files atomically defeats `verify`; the goal is making casual edits visible in `git log`. Manifest - `baseline.integrity_mode: off | warn | strict` (default warn). `off` is the back-compat escape hatch; `strict` makes SHIP-BASELINE-INTEGRITY-MISMATCH set `blocks_release=true`. Checks - Three new check IDs in `category: baseline`: - SHIP-BASELINE-INTEGRITY-MISMATCH (critical) — hash mismatch, missing audit log, entry references unknown run_id, or entry pre-dates v0.5 and lacks provenance. - SHIP-BASELINE-ENTRY-EXPIRED (high) — `provenance.expires` past. - SHIP-BASELINE-ENTRY-STALE (low) — deprecated check ID or resolved-not-pruned (scan-aware). - Integrity findings bypass `checks.ignore` and `checks.severity_overrides` — silencing tamper detection would defeat the trust property the audit log defends. CLI - New `agents-shipgate baseline verify` command with `--strict`, `--json`, `--audit-log` flags. - New exit code 6 (`baseline_integrity_failure`) emitted only by `baseline verify --strict` on hash mismatch. `scan` continues to use 20 for gate failure regardless of integrity_mode. Tests - 28 new tests in `tests/test_baseline_integrity.py` covering audit- log primitives, provenance behavior in `baseline_from_report`, every `verify_baseline` issue kind, the scan-pipeline integration in warn/strict/off modes, and the CLI `baseline verify` surface. - 1119 total tests pass; ruff clean; schemas regenerated and `generate_schemas.py --check` is clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add baseline integrity (M2): provenance, audit log, verify command#89

Add baseline integrity (M2): provenance, audit log, verify command#89
pengfei-threemoonslab wants to merge 1 commit into
mainfrom
m2-baseline-integrity

pengfei-threemoonslab commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pengfei-threemoonslab commented May 17, 2026

Summary

Type

What this fixes (why)

Design notes for reviewers

Verification

Release-readiness notes

What's NOT in this PR (intentional, queued for follow-up)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant