Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

- Added bounded parser/report defaults to `ordvec-manifest` verification for
manifest JSON size, row-identity JSONL line length, row count,
duplicate-tracking memory, report issue count, and SQLite cached report size.
duplicate-tracking memory, auxiliary artifact declaration count and bytes,
report issue count, and SQLite cached report size.

### Added

- Added named auxiliary artifact verification to `ordvec-manifest`, including
required/optional sidecar states, path/size/SHA-256 checks, deterministic
report entries, and SQLite cache invalidation for declared sidecar bytes.

## 0.3.0 - 2026-05-29

Expand Down
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -272,9 +272,10 @@ structurally valid file can still be untrusted. If an index file crosses a
trust boundary (network transfer, shared storage), verify it before loading.
The full GitHub checkout includes a publish=false sidecar CLI,
`ordvec-manifest`, that binds an index file to a JSON manifest by SHA-256,
header metadata, row identity, and attestation shape checks. It does not sign
artifacts, manage keys, or decide deployment trust policy. No in-format crypto
is shipped because it would add key management the library can't own. See
header metadata, row identity, named auxiliary sidecars, and attestation shape
checks. It does not sign artifacts, manage keys, or decide deployment trust
policy. No in-format crypto is shipped because it would add key management the
library can't own. See
[`docs/PERSISTED_FORMAT.md`](https://github.com/Fieldnote-Echo/ordvec/blob/main/docs/PERSISTED_FORMAT.md),
[`docs/INDEX_PROVENANCE.md`](https://github.com/Fieldnote-Echo/ordvec/blob/main/docs/INDEX_PROVENANCE.md),
and [`THREAT_MODEL.md`](https://github.com/Fieldnote-Echo/ordvec/blob/main/THREAT_MODEL.md)
Expand Down
11 changes: 11 additions & 0 deletions docs/INDEX_PROVENANCE.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,12 +67,23 @@ The manifest verifier checks:
- row identity, either explicit `row_id_identity` or a strict JSONL row map
whose `row_id` equals the zero-based line number and whose `db_id` is
non-empty, NUL-free, and unique by default;
- declared auxiliary artifacts, checking each caller-named sidecar's path,
SHA-256 digest, byte length, and configured byte ceiling under the same
default path policy as the primary index artifact;
- optional `calibration` profile references, checking profile identity,
path/hash integrity, encoder identity, and ordinalization compatibility;
- attestation **shape** only: predicate type, builder id when present, and at
least one subject SHA-256 matching the artifact when attestations are
supplied.

Auxiliary artifacts are for application-owned sidecars such as metadata,
secondary indexes, or stores that a caller intends to load together with the
ordvec index. The verifier does not interpret those bytes; it only reports
whether declared required members were verified, whether optional members were
present or absent, and whether any declared member failed path, size, or digest
checks or exceeded the configured auxiliary artifact byte limit. Callers should
load sidecars only after the relevant declaration is verified.

When present, `calibration` binds an index artifact to a hashed ordinal profile
used to interpret overlap, bucket, sign, or rank evidence under a calibrated
null. The verifier checks profile identity, path/hash integrity, encoder
Expand Down
39 changes: 28 additions & 11 deletions ordvec-manifest/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@

Repo-local, publish=false sidecar verifier for ordvec index manifests.

It verifies index bytes, probed header metadata, row identity, optional
calibration profile references, and attestation shape before a caller loads an
ordvec index. It does not sign artifacts, manage keys, call networks, mutate
index files, decide deployment trust policy, compute calibration statistics, or
change the C ABI.
It verifies index bytes, probed header metadata, row identity, named auxiliary
artifacts, optional calibration profile references, and attestation shape before
a caller loads an ordvec index. It does not sign artifacts, manage keys, call
networks, mutate index files, decide deployment trust policy, compute
calibration statistics, or change the C ABI.

```sh
cargo run -p ordvec-manifest -- create \
Expand Down Expand Up @@ -36,14 +36,19 @@ Stable limit codes are part of the contract:
(`row_identity_row_count_limit_exceeded`);
- row-identity duplicate-tracking `db_id` bytes: 64 MiB
(`row_identity_duplicate_tracking_limit_exceeded`);
- auxiliary artifact declarations: 1,024
(`auxiliary_artifact_count_limit_exceeded`);
- auxiliary artifact bytes per declared file: 64 MiB
(`auxiliary_artifact_file_too_large`);
- collected report issues: 1,024, after which a
`verification_report_issue_limit_exceeded` issue is emitted;
- SQLite cached report JSON: 4 MiB (`sqlite_cached_report_too_large`).

The CLI exposes matching override flags on `inspect`, `verify`, `create`,
`sqlite verify`, and `sqlite activate`: `--max-manifest-bytes`,
`--max-row-map-line-bytes`, `--max-row-map-rows`,
`--max-row-map-tracked-id-bytes`, `--max-report-issues`, and
`--max-row-map-tracked-id-bytes`, `--max-auxiliary-artifacts`,
`--max-auxiliary-artifact-bytes`, `--max-report-issues`, and
`--max-cached-report-bytes`. Library callers can override the same ceilings via
`VerifyOptions::limits`.

Expand All @@ -55,6 +60,8 @@ Stable limit codes:
| row-identity JSONL line bytes | `row_identity_line_too_large` | `row_identity_line_too_large` |
| row-identity JSONL rows | `row_identity_row_count_limit_exceeded` | `row_identity_row_count_limit_exceeded` |
| row-identity duplicate-tracking `db_id` bytes | `row_identity_duplicate_tracking_limit_exceeded` | `row_identity_duplicate_tracking_limit_exceeded` |
| auxiliary artifact declarations | `auxiliary_artifact_count_limit_exceeded` | n/a |
| auxiliary artifact bytes per declared file | `auxiliary_artifact_file_too_large` | n/a |
| collected verification report issues | `verification_report_issue_limit_exceeded` | n/a |
| SQLite cached report JSON bytes | n/a | `sqlite_cached_report_too_large` |

Expand All @@ -67,11 +74,21 @@ otherwise be reported. These limits bound metadata parsing and report/cache
growth; hashing an index or calibration profile is still proportional to the
artifact bytes being verified.

Manifests may declare `auxiliary_artifacts` for caller-owned sidecars that
should be integrity-checked with the same path policy as the primary index.
Each entry has a stable `name`, relative `path`, lowercase SHA-256 digest,
`file_size_bytes`, and a `required` flag that defaults to `true`. Required
members fail verification when missing, tampered, size-mismatched, or rejected
by path policy. Optional members are reported as verified when present or as
`optional_absent` with a stable reason code when absent. The verifier checks
bytes only; application semantics remain with the caller.

With `--features sqlite`, the `sqlite verify` and `sqlite activate` subcommands
add a local cache/audit log plus one active-manifest pointer. This is not a
full named registry. `sqlite verify --use-cache` reuses only reports whose
manifest, verification options, artifact bytes, row-identity bytes, and
calibration profile bytes still match; otherwise it runs fresh verification and
stores a new report. `sqlite activate --force` writes the active pointer even
when verification fails, emits a `sqlite_activation_forced` warning in JSON
output, and exits zero because it did mutate activation state.
manifest, verification options, artifact bytes, row-identity bytes,
calibration profile bytes, and declared auxiliary artifact states/bytes still
match; otherwise it runs fresh verification and stores a new report.
`sqlite activate --force` writes the active pointer even when verification
fails, emits a `sqlite_activation_forced` warning in JSON output, and exits zero
because it did mutate activation state.
Comment thread
qodo-code-review[bot] marked this conversation as resolved.
Loading
Loading