Skip to content

Latest commit

 

History

History
407 lines (341 loc) · 25.5 KB

File metadata and controls

407 lines (341 loc) · 25.5 KB

The ev model

This is the model in depth — deeper than the project README. It describes the on-disk Tick schema, the parts of a decision (Ground, Check), content-addressed identity and the frozen golden vectors, append-only immutability, the refusals ev verify enforces, and the honesty / trust boundary. Everything here is accurate to the code; nothing is overstated.

For the commands that produce and read these records, see commands.md.

The Tick

A Tick is one decision in the chain. Its fields split into two groups by whether they enter the content hash.

Hashed payload — the four fields that define the decision's identity:

  • decision — the decision text.
  • observe — the situation observed when the decision was made (may be empty).
  • grounds — the ordered list of reasons the decision rests on (see below).
  • parent_id — the id of the predecessor tick; "" on genesis.

Bookkeeping — recorded but excluded from the hash, so they can change without forging a new identity:

  • id — the content-addressed identifier (the hash output; see Identity).
  • status — the tick's status string ("live").
  • held_since — an RFC 3339 timestamp stamped when the tick is written (at ev decide / ev guard). It is bookkeeping (excluded from the hash), so it never affects the id.
  • blame — the human author on the hook for this decision.
  • authority — an optional declared tag (user-ruled | agent-disposable), excluded from the hash; written only when set, and surfaced by show/list/brief/reopen.
  • jurisdiction — an optional declared tag (A | B | C | D), excluded from the hash; written only when set. A/B may gate; C/D are detect-only — structurally ungateable (see Jurisdiction below). Surfaced by show/list/reopen.
  • source_ref — an opaque, producer-supplied source identity ev never interprets: a non-empty string (e.g. R2289, #555, an issue ref) or a non-empty structured object (JSON). Excluded from the hash; written only when set. ev derives exactly one thing from it — a stable dedup/reconcile key (the string verbatim, or the deterministic sorted-compact JSON of an object) — and compares only that key, never the contents. Used by ev migrate to dedup and reconcile a backfill. Surfaced by show/list/reopen. (On the ev decide / ev guard path it is set with --source-ref as a plain string; the canonical intake additionally accepts a structured object — see commands.md.)
  • provenance — an optional declared tag from the closed vocabulary {imported, agent-proposed, human-now}, excluded from the hash; written only when set, absent ⇒ human-now. It records how the decision entered the ledger (see Provenance below). Vocab-validated; an out-of-vocabulary value is refused.
  • supersedes / ratifies — the two relation-overlay edges, each (when set) the 12-hex id of the tick this one points at. supersedes (written by ev supersede) names the tick this one supersedes; ratifies (written by ev ratify) names the agent proposal this human-now child ratifies. Both are non-hashed, so they never move the id; the brief/list collapse reads supersedes to drop the superseded tick precisely. These are the two specific, adopter-driven bridges — the general case-law graph (governed-by / case-of / arbitrary typed edges) is deliberately not built, and a machine-fence test pins that the overlay edges are exactly these two, so a third can only be added deliberately.

On disk a tick is stored as pretty JSON containing the hashed payload keys plus the bookkeeping keys at top level (id, status, held_since, blame, and — when set — authority, jurisdiction, source_ref, provenance, supersedes, ratifies). ev show prints that file as-is. The genesis tick on disk looks like:

{
  "decision": "freeze the retrieval schema for v2",
  "observe": "evaluating retrieval backend",
  "grounds": [
    {
      "claim": "team still wants a frozen schema",
      "supports": "chosen",
      "check": { "by": "person", "ref": "Q3 infra review" }
    },
    { "claim": "pgvector would lock our schema", "supports": "rejected:pgvector" }
  ],
  "parent_id": "",
  "id": "e2b337f53a1f",
  "status": "live",
  "held_since": "<rfc3339-time>",
  "blame": "Wang Yu"
}

This is the frozen genesis golden vector, so its id is genuinely e2b337f53a1f (the same id pinned in the Identity table below). The held_since is shown as a placeholder; the real one is an RFC3339 time stamped at write time.

Because blame, status, held_since, authority, jurisdiction, source_ref, provenance, and supersedes sit outside the hash, blanking blame on disk does not change the id — which is exactly why ev verify checks blame separately (R5). Equally, tagging a jurisdiction, a source_ref, a provenance, or a supersedes edge on a decision leaves its id untouched: these are declared bookkeeping, never part of the decision's identity. (The non-hashed supersedes edge is what lets a supersession be recorded as an overlay on top of the immutable chain without rewriting — or re-identifying — the tick it supersedes.)

source_ref is the adopter's concept, carried opaquely

A field is first-class in ev only if ev's own behavior branches on its meaning. ev never branches on source_ref's contents — it derives one dedup key and compares only that — so source_ref is not an ev concept. It is the adopter's concept (a "round", a ticket, a sprint, a work-unit) carried through opaquely. ev has no notion of "rounds": whatever the source calls its unit of work, ev only ever sees an identity to dedup on.

Ground

A Ground is a single reason a decision rests on. It has three parts:

  • claim — the reason text (non-empty).
  • supports — either the literal "chosen" (a reason for the decision taken) or "rejected:<option>" (a road-not-taken: a reason an alternative was declined). For a rejected support, the <option> part must be non-empty.
  • check — an optional Check that keeps the ground honest over time. When absent, the check key is omitted entirely from the JSON — it never serializes as null.

ev decide --assume <claim> opens a chosen ground; ev decide --reject "<opt>: <why>" opens a rejected road with claim = <why> and supports = rejected:<opt>.

Check

A Check is a binding that keeps a ground honest as the world changes. It is one of two shapes, distinguished on disk by the by field:

  • Person — a human re-check. { "by": "person", "ref": <reference> }, where reference names when/where a person re-affirms the ground (e.g. "Q3 infra review"). Created with --revisit. A Person check binds only a chosen ground — a rejected road never carries a human re-check.

  • Test — a test that guards the ground. { "by": "test", "ref": <selector>, "verified_at_sha": <40-hex>, "counter_test": <selector>, "liveness": { … } }. A Test check binds a chosen ground unconditionally, or a rejected road when the decision is --authority user-ruled and a counter-test is present (a "tripwire" — see User-ruled rejected tripwires below):

    • reference (ref) — the test selector that should pass while the claim holds (or, for a tripwire, while the rejected road stays closed).
    • verified_at_sha — the commit the test was last verified at; exactly 40 lowercase hex.
    • counter_testrequired for an authored binding, optional for a harvested one: the test that should flip red if the claim breaks. When present it is a non-empty selector; when absent the key is omitted entirely from the JSON (it never serializes as null or ""). An authored binding from ev decide / ev guard always carries one (such a binding without it is refused as vacuous), and a rejected-road tripwire always carries one (no harvested rejected-road tripwire). A harvested binding (ev migrate, chosen grounds only) deliberately carries none — see Harvested bindings below.
    • liveness — three non-empty string sets that say where the test must keep running for the binding to be considered alive: platforms, triggered_by, surfaces. In the canonical form these sets are sorted and de-duplicated, so their order does not affect identity. Created with --assume-test (plus --counter-test, --on-platform, --triggered-by, --surface), or after the fact with ev guard; or, without a counter-test, harvested by ev migrate.

A ground may carry at most one check, and never both shapes at once.

User-ruled rejected tripwires

By default a rejected road (supports = rejected:<opt>) carries no check. When the decision is --authority user-ruled, a rejected road may carry a Test check — a tripwire: a falsifiable check on a road the human deliberately closed. The rules:

  • the tripwire must carry a counter-test (a rejected-road check is never harvested), plus full 3-key liveness — exactly the authored-binding grammar;
  • the check reads green while the road stays closed and red when someone re-walks it (re-introduces the closed option); the counter-test proves the check can flip;
  • a --revisit (human re-check) on a rejected road is always refused — a closed road is guarded by a structural tripwire, not by a person re-confirming a non-choice.

A tripwire binds only a structural token (a grep-able artifact: a file change, a commit, a schema). A prose re-walk with no structural token — e.g. a GitHub milestone re-assignment (the canonical #1194 case) — has nothing to bind and stays surface-only: the tripwire does not, and cannot, catch it. This is the honest scope, not a flaw — see the structural-jurisdiction limit in the honesty boundary. (The same authority=user-ruled + counter-test rule is enforced on the canonical-intake path, so the constraint is structural across every producer.)

Harvested bindings (counter-test absent)

A harvested Test binding is one whose counter_test is absent. It is the shape ev migrate produces when it adopts an existing test as a check: the test is real and its liveness is fully declared (a harvest still demands a platform, a trigger, and a surface — you cannot half-harvest), but its falsifiability was never proven — no one has shown a counter-test that flips red, so the binding could in principle be vacuous.

ev check evaluates a harvested binding exactly as a normal one — a passing harvested test still reads green, a failing one still reads red — and never silently upgrades it. What it adds is an honest annotation: a harvested row is tagged (harvested — falsifiability not proven; …), and a trailing harvested-unproven: N of M test bindings have no counter-test (run ev guard to add one) line counts the debt. ev guard is the way out: add a --counter-test and the harvested binding becomes a proven, authored one (a new child tick, since the check is hashed).

Because the canonical encoding omits counter_test on absence (rather than emitting it as null), a harvested id is just as byte-stable as a counter-test-carrying one — a third frozen golden, harvested (0cf784b51331), pins exactly that.

Identity

id = first 12 hex characters of SHA-256 over the canonical JSON of the hashed payload {decision, observe, grounds, parent_id} — and only those fields. The canonical encoding is RFC 8785 / JCS: object keys sorted, compact separators, raw (un-escaped) UTF-8. This holds here precisely because the payload is string-only — it carries no numbers, booleans, or nulls, so JCS number canonicalization never has to be applied. Liveness sets are sorted and de-duplicated before hashing; the grounds array keeps its authored order.

Because identity is the hash of the payload, any change to a hashed field produces a different id — there is no in-place edit (see Append-only). Conversely, editing a bookkeeping field (e.g. blame) leaves the id unchanged.

Three frozen golden vectors pin this function so the hashing can never silently drift:

Vector id
genesis e2b337f53a1f
case1 638c47b0c9dd
harvested 0cf784b51331

ev verify --self-test recomputes all three and fails if any id moves. harvested is case1 with its first ground's counter_test omitted — it pins that omit-on-absence keeps a harvested binding's id byte-stable, so adding the optional-counter-test schema moved no existing id.

Append-only immutability

The chain is never edited in place. A change is a new child whose parent_id points at its predecessor, and whose own id is the hash of its (new) payload. This is why ev guard — which adds a check, a hashed field — writes a new child rather than mutating the tick it targets. HEAD tracks the latest tick; ev guard can only amend the current HEAD.

Even a non-hashed tag (authority / jurisdiction / provenance) is never rewritten in place. A prior ruling is replaced with ev supersede (see commands.md), which appends a child carrying the supersedes edge — either a re-tag (copies the target's hashed payload verbatim, fixes a standing tag) or an overturn (a fresh ruling with its own grounds) — consistent with the same append-only law. The superseded tick stays as honest history; ev brief and ev list collapse the lineage to its current state, so the current ruling surfaces while ev log still shows the full lineage and ev reopen <id> marks an overturned ruling "superseded by".

Jurisdiction — and the C/D structurally ungateable guarantee

A decision may carry a declared jurisdiction tag from the closed vocabulary {A, B, C, D} (out-of-vocabulary is refused). It is bookkeeping — not hashed — and it answers one question: may a not-green check on this decision fail a build?

Because it answers exactly one question, the vocabulary carries exactly two meanings — may-gate and detect-only — across its four labels. Treat A as the canonical may-gate label and C as the canonical detect-only label; B behaves identically to A and D to C. B/D are accepted as redundant aliases (so older ledgers keep validating), but prefer A/C for new decisions.

  • A / B — may gate. A decision in jurisdiction A or B behaves exactly as an un-tagged one: a bound check that reads red (or stale, not-run, …) trips ev check --exit-on-red. These are the decisions this repo owns and is willing to be stopped by.
  • C / D — detect-only, structurally ungateable. A decision in jurisdiction C or D may be surfaced but can never gate. This is enforced by two independent locks, not by convention:
    • Gate-time lock. In ev check, any not-green verdict on a C/D decision is mapped to the non-gating memo verdict before the --exit-on-red writer sees it. The row still prints (with the memo label, naming the decision), so the fact is never hidden — it just cannot flip the exit code. memo is a co-equal, non-gating fact, the sibling of exempt.
    • At-rest lock. ev verify refuses a C/D tick that carries any Check::Test on a ground (a C/D jurisdiction (detect-only) tick may carry no test check). A detect-only decision must hold no runnable test binding at all — so there is nothing that could gate. (This is a distinct invariant from the no-vacuous-binding rule; it is checked separately.)

The two locks together make "detect-only" a structural property of the record, not a flag a future code path might forget to honor: a C/D decision is ungateable at the gate and cannot even store a gating check at rest. This is what lets a repo import another team's history — rulings it wants to watch but has no authority to fail on — as jurisdiction C, honestly: surfaced forever, gating never.

Provenance — how a decision entered the ledger

A decision may carry a declared provenance tag from the closed vocabulary {imported, agent-proposed, human-now} (out-of-vocabulary is refused). It is bookkeeping — not hashed — and it records how the tick entered the ledger, absent ⇒ human-now:

  • human-now — a human ruling captured now (the absent default).
  • agent-proposed — a machine-drafted decision awaiting a human (the human-pending lane).
  • imported — faithfully-transcribed history, recorded now but authored at some past time by the source it was migrated from.

The stamping discipline (the launder defense). provenance is stamped only at the migrate / canonical-intake boundary. ev decide and ev guard always author human-now unconditionally — there is no --provenance flag, and the fresh-authorship path never reads a caller-supplied value. This is what makes the field sound: an importer can never launder a forbidden op by claiming provenance=imported on a freshly authored decision, because the fresh path can never write imported. On ev migrate, the convenience extractor kinds stamp imported by default (history); a canonical record must declare its provenance explicitly (no default — an omitted provenance is refused), so a live runner emits agent-proposed and a backfill adapter emits imported.

The honest limit (declared, not verified). human-now is the fresh-authorship default — the mark of the decide/guard door, not a proof that a human was at the keyboard. ev performs no caller-identity check: an agent that calls ev decide also gets human-now. The human/agent boundary is therefore a convention — agents use ev propose, humans use ev decide / ev ratify — not a structural guarantee, and provenance is declared, not cryptographic (signing is a deliberate non-goal; see the honesty boundary). What is structural and machine-enforced is the inverse: a record declared agent-proposed can never gate (the gate-time lock maps it to non-gating memo) and is invisible to ev brief until a named human ratifies it with ev ratify.

The R5 provenance partition. Only one refusal arm is partitioned by provenance: the R5 lexical forbidden-op lint (auto-close / auto-prune / self-stop / auto-inherit). For a provenance=imported tick — faithfully transcribed text a human once wrote — an op-word hit is downgraded to a non-gating warning: (it is recorded, not authored now), surfaced by ev verify so it stays visible with a named human still on the hook. For agent-proposed and human-now the op-arm stays a hard violation (a live agent draft must not smuggle op-language; fresh authorship is held to the line). Every other refusal stays hard for all provenance, including imported: empty-blame (R5), the R3 self-evolve lint, the no-fabricated-author rule, and the C/D-no-test lock never soften. Exactly one arm softens, for exactly one provenance.

Forward-compat — the two-tier schema

ev's on-disk schema is closed for everything that defines identity, and tolerant for everything that does not — a two-tier rule that lets a newer writer add a bookkeeping field without bricking an older reader:

  • Tier 1 — the hashed/identity set is STRICT. The keys {decision, observe, grounds, parent_id, id, status, held_since, blame}, and every nested key inside grounds / check / liveness, are parsed against a closed schema: a missing identity field, or any unknown key inside the hashed payload, is an error (field outside closed schema: <k>). The content-addressed id can never carry an unvalidated field.
  • Tier 2 — unknown top-level non-hashed keys are TOLERATED. A truly-unknown top-level key (one outside both the identity set and the known-non-hashed allow-list {authority, jurisdiction, source_ref, provenance}) is parsed through, not rejected, so a tick written by a future ev still loads. ev verify surfaces it as a warning: (not a violation), naming the key, so a typo'd field name stays visible rather than silently swallowed.

There is an inert schema_version recorded in the store config; it is read lazily, only at this tolerate-vs-reject decision, and is not a parsed config field.

The forward-compatibility limit

Forward-compat is forward-only and cannot be retrofitted. A binary that predates a bookkeeping field has a schema closed for all top-level keys, so a tick that carries a newer field (jurisdiction, source_ref, provenance, or any future tolerated key) is rejected by that older ev verify, not tolerated — there is no way to teach an already-shipped reader to ignore a field added after it. The two-tier rule buys tolerance for future fields going forward; it cannot reach backward to a reader that already shipped. Stated plainly so no one assumes a guarantee that does not exist.

The refusals (R1–R6) as ev verify enforces them

ev verify scans every tick file and reports all violations it finds (not just the first). The refusals:

  • R1 — closed schema (hashed) + tolerant (non-hashed). Every tick, ground, check, and liveness object is parsed strictly for the hashed/identity tier: any field inside the hashed payload outside its fixed key set is rejected (field outside closed schema: <k>). A truly-unknown top-level non-hashed key is tolerated (parsed through) and surfaced as a warning:, never an error — see Forward-compat above. A C/D-jurisdiction tick that carries any test check is rejected (a C/D jurisdiction (detect-only) tick may carry no test check). Reported as R1/R2.
  • R2 — check shape. A check must be exactly a Person (by/ref) or a Test (by/ref/verified_at_sha/counter_test/liveness) — never a mix; by must be "test" or "person"; a test's verified_at_sha must be 40 lowercase hex; liveness sets must be non-empty. (At write time, R2 also forbids a single ground being both --revisit and --assume-test, and ev guard refuses to force a test onto a Person ground.) Reported as R1/R2.
  • R4 / R6 — id == hash + chain integrity. Each stored tick's recomputed hash must equal its filename (id != hash(payload) (R4/R6)), the in-file id field must equal the filename (stored id field … != filename (R6)), every non-empty parent_id must resolve to an existing tick (parent_id … does not resolve (R6)), and the parent chain must be acyclic (parent chain has a cycle (R6)).
  • R5 — every mutating op names a human. A tick with empty blame is a violation (empty blame (R5)) — for all provenance, including imported. A best-effort lexical lint also flags forbidden machine-initiated op language (auto-close, auto-prune, self-stop, auto-inherit). This op-arm is the one refusal partitioned by provenance (see Provenance above): for a provenance=imported tick it is downgraded to a non-gating warning: (faithfully-transcribed history, recorded not authored now); for agent-proposed and human-now it stays a hard violation. Every other R5 arm (empty-blame, no-fabricated- author) stays hard for all provenance.
  • R3 — the system is never the subject of self-evolve language. A best-effort lexical lint flags self-evolve / self-improve verbs (e.g. self-evolve, self-improve, self-grade) in the free-text fields, where the subject should be a human, not the system.

The R3 and R5 lints are heuristics over fixed word lists: a re-wording evades them. They are surfaced honestly as best-effort, not as semantic guarantees.

Honesty / trust boundary

ev completes one specific picture: does a human-vetted decision stay live, and is the check guarding it itself alive? It does that by content-addressing the decision record and by demanding that every test binding name a counter-test and the surfaces that keep it live — so a check that has quietly died becomes visible.

It does not claim tamper-resistance of offline test outcomes. ev records that a test was bound and the commit it was verified at, but it cannot prove an offline test result was honest. That is a documented boundary, not a guarantee — the same framing as the project README.

ev validates that grounds are well-formed, never that they are faithful to the adopter's source. When a decision is brought in through the canonical intake (see migrating.md), ev re-runs its own read-path validators on every ground and check — but a producer-owned adapter that mis-parses its source and emits structurally valid but wrong grounds is a producer bug ev cannot catch. The honest-capture law protects against ev synthesizing grounds; it cannot protect against an edge adapter fabricating them. This is stated plainly rather than overclaimed.

ev's triggers are git-recorded: a binding's triggered_by paths and a bound check going red are both detected from the commit history (ev check compares the latest receipt's commit against the declared triggered_by paths). External-state drift — a UI click, an org/config change, or an upstream-API behavior change that leaves no git commit — does not fire ev. ev is decision memory, not an environment sentinel; a check that can only fail on external state should be run on a timer (not currently supported), not bound to triggered_by.