diff --git a/.claude/skills/audit-rule-author/SKILL.md b/.claude/skills/audit-rule-author/SKILL.md
new file mode 100644
index 0000000..b10c988
--- /dev/null
+++ b/.claude/skills/audit-rule-author/SKILL.md
@@ -0,0 +1,484 @@
+---
+name: audit-rule-author
+description: >
+  Author and maintain compliance audit rule configuration files for the Docs-digitization
+  pipeline. Use this skill whenever the user wants to: add a new rule to an existing audit
+  framework, update an existing rule's config or pass criteria, reduce false positives from
+  a rule, create an entirely new audit framework (new rule set), tune skip conditions or
+  severity, configure cannot_evaluate for rules needing external data, change a rule's
+  scope or applicable document/section types, or analyze compliance_result.json evaluation
+  results to diagnose and improve rules. Triggers on phrases like: "add a rule",
+  "update rule", "create a new audit", "write a compliance check", "too many false positives",
+  "rule is firing incorrectly", "add a new framework", "configure audit rules",
+  "author compliance rules", "new audit type", "analyze results", "look at findings",
+  "why is this rule failing", "what's non-compliant", "review evaluation results".
+---
+
+# Audit Rule Authoring
+
+You help author and maintain compliance audit rule configuration files for the
+Docs-digitization pipeline. Rules live in `backend/app/compliance/rules/`.
+
+**Read `references/rule_authoring_playbook.md` now** — it is the authoritative guide for
+all authoring conventions. Follow it exactly. This SKILL.md only covers the workflow
+orchestration on top of it.
+
+If the rule you are authoring or fixing uses `evaluation_strategy: agentic_audit`, also
+**read `references/agentic_audit_guide.md`** — it covers target vs context design,
+pass_criteria patterns (signature images, multi-row tables, NOT_APPLICABLE thresholds),
+and validation with the postpass script.
+
+---
+
+## Document data location
+
+All processed document data lives under the storage root — `AT_STORAGE__BASE_PATH`
+(the same var the pipeline/API use), or `backend/data/documents` when run from the
+repo root — at `<storage-root>/<doc_id>/`:
+
+```
+<storage-root>/<doc_id>/
+├── result.json          ← OCR output: raw_markdown per page (keyed by string page number)
+├── segmentation.json    ← page-range sections (section_type, name, document_type, start/end page)
+└── ...
+```
+
+(`peek_pages.py` accepts `--data-root` and honors `AT_STORAGE__BASE_PATH` if your
+docs live elsewhere.)
+
+**Before writing any rule**, use the `peek_pages.py` script to read the actual page
+content so your `pass_criteria` is grounded in real evidence:
+
+```bash
+# List all sections and their page ranges for a document
+python <skill-path>/scripts/peek_pages.py --doc <doc_id> --list-sections
+
+# View full OCR markdown for specific pages
+python <skill-path>/scripts/peek_pages.py --doc <doc_id> --pages 33,34
+
+# View section metadata only (no markdown dump)
+python <skill-path>/scripts/peek_pages.py --doc <doc_id> --pages 33 --no-markdown
+
+# Page ranges also work
+python <skill-path>/scripts/peek_pages.py --doc <doc_id> --pages 33-35
+```
+
+The script path is `<skill-base-dir>/scripts/peek_pages.py`. The skill base directory
+is shown at the top of this file when the skill loads.
+
+When the user gives you a doc ID and page numbers, always `--list-sections` first to
+confirm section types before authoring rules — it prevents scoping mistakes.
+
+---
+
+## Determine the task
+
+Ask the user one question to orient yourself if it isn't already clear:
+
+> **"Are you (a) adding or updating rules in an existing audit framework, or (b) creating a brand-new audit framework?"**
+
+Then follow the appropriate path below.
+
+---
+
+## Path A: Add or update rules in an existing framework
+
+### 1. Read context files
+
+Read all three before touching anything:
+
+```
+backend/app/compliance/rules/<framework>_rules.md
+backend/app/compliance/rules/<framework>_rules.yaml
+backend/app/compliance/rules/document_profiles.yaml   ← canonical document/section types
+```
+
+The `document_profiles.yaml` is the source of truth for all valid `applicable_document_types`
+and `applicable_section_types` values. Never invent type names — use only what is defined there.
+
+### 2. Author the rule(s)
+
+Follow the authoring conventions in `references/rule_authoring_playbook.md`:
+- Rule text goes in `<framework>_rules.md`
+- YAML config goes in `<framework>_rules.yaml`
+- Use the minimum required fields plus any applicable optional fields
+
+### 3. Validate the rule against real pages
+
+Always peek the target pages first so `pass_criteria` is grounded in actual OCR content:
+
+```bash
+python <skill-path>/scripts/peek_pages.py --doc <doc_id> --list-sections
+python <skill-path>/scripts/peek_pages.py --doc <doc_id> --pages <target_pages>
+```
+
+Then choose the right validation tool based on `evaluation_strategy`.
+
+#### For `llm`, `vision`, or `text_and_vision` rules — use `validate_cli`
+
+Ask the user for a doc ID and page numbers (one or two that should PASS, one or two that
+should FAIL) if you don't already have them.
+
+```bash
+cd backend/
+
+# positive samples — expect PASS
+.venv/bin/python -m app.compliance.rules.validate_cli \
+  --agent <framework> --rule <number> \
+  --doc <doc_id> --pages <page,page> --expect pass
+
+# negative samples — expect FAIL
+.venv/bin/python -m app.compliance.rules.validate_cli \
+  --agent <framework> --rule <number> \
+  --doc <doc_id> --pages <page,page> --expect fail
+```
+
+Add `--no-vlm` to skip vision during quick iteration. The CLI exits non-zero if any page
+contradicts the `--expect` outcome.
+
+Diagnose failures from the `reasoning` field:
+- Too strict → loosen language or add OCR-tolerance notes
+- Too vague → tighten criteria or add concrete examples
+- Wrong scope → adjust `applicable_section_types` or `applicable_page_types`
+- Wrong skip → a page is being skipped that shouldn't be, or vice versa
+
+#### For `agentic_audit` rules — use the postpass script
+
+**Do not use `validate_cli` for `agentic_audit` rules** — it ignores `context_sources`
+entirely and produces misleading results.
+
+Use the framework's agentic postpass script instead (always from `backend/`):
+
+```bash
+cd backend/
+.venv/bin/python scripts/run_<framework>_agentic_postpass.py <doc_id> --rule-number <N>
+
+# Add --verbose-debug for full step-by-step agent traces on stderr:
+.venv/bin/python scripts/run_<framework>_agentic_postpass.py <doc_id> --rule-number <N> --verbose-debug
+```
+
+The script outputs JSON with `status`, `confidence`, `reasoning`, and `evidence`.
+Read the `reasoning` field first — it almost always identifies exactly what went wrong.
+
+For full authoring and debugging guidance for `agentic_audit` rules, read
+**`references/agentic_audit_guide.md`** (loaded at the top of this session if the rule
+uses this strategy).
+
+### 4. Config validator
+
+```bash
+cd backend/
+.venv/bin/python - <<'PY'
+from app.compliance.rules.registry import get_registry
+from app.compliance.rules.profiles import validate_compliance_configs
+validate_compliance_configs(get_registry())
+print("OK")
+PY
+```
+
+Fix any errors before reporting done.
+
+---
+
+## Path B: Create a new audit framework
+
+A new framework requires three files. Create them in this order:
+
+### 1. Register the framework
+
+Add an entry to `backend/app/compliance/rules/agents_meta.json`:
+
+```json
+{
+  "id": "<framework_id>",
+  "label": "<Human-readable label>",
+  "description": "<One-sentence description of what this framework audits>"
+}
+```
+
+`framework_id` must be lowercase, no spaces (e.g., `data_integrity`, `environmental`).
+
+### 2. Create the rule text file
+
+Create `backend/app/compliance/rules/<framework_id>_rules.md`.
+
+Structure it with category sections matching the framework's logical groupings.
+Each rule is one numbered line: `<number>. <Imperative statement of the GMP requirement.>`
+
+```markdown
+Category: <Category Name>
+
+1. <Rule statement.>
+2. <Rule statement.>
+
+--------------------------------------------------
+
+Category: <Next Category>
+
+3. <Rule statement.>
+```
+
+### 3. Create the YAML config file
+
+Create `backend/app/compliance/rules/<framework_id>_rules.yaml`.
+
+Start from this skeleton — fill in what's known, leave optional fields at defaults:
+
+```yaml
+defaults:
+  scope: page
+  severity: observation
+  evaluation_mode: llm
+  applicable_document_types: []
+  excluded_document_types: []
+  applicable_page_types: []
+  applicable_section_types: []
+  cross_section_requirements: []
+  keywords: []
+  pass_criteria: ""
+  skip_conditions: []
+  cannot_evaluate_reason: ""
+  requires_external_data: []
+  notes: ""
+
+categories:
+  <category_slug>:
+    severity: <major|minor|critical|observation>
+    applicable_page_types: []
+    rules:
+      1:
+        pass_criteria: >
+          <explicit condition>
+        skip_conditions:
+          - "<condition> -> not_applicable"
+```
+
+Read `document_profiles.yaml` for valid document and section type values.
+
+### 4. Validate each rule against real pages
+
+Follow the same CLI-based validation process as Path A, Step 3. For a new framework, aim
+for at least one positive and one negative sample per rule before committing the files.
+
+### 5. Config validator
+
+Run the validator (same command as Path A). Since the new agent class doesn't exist yet,
+the validator may warn about the unregistered agent — that is expected. Note it to the user:
+**agent class wiring is a separate engineering task** beyond this skill's scope.
+
+---
+
+## Path C: Analyze compliance results and improve rules
+
+Use this path when the user wants to diagnose why rules are failing, firing incorrectly,
+or producing uncertain results — by inspecting actual evaluation output.
+
+### 1. Filter evaluations
+
+Use `analyze_results.py` to slice `compliance_result.json` by any combination of
+category, status, and page numbers:
+
+```bash
+# List categories and status counts — always start here for an overview
+python <skill-path>/scripts/analyze_results.py --doc <doc_id> --list
+
+# Filter by status + category
+python <skill-path>/scripts/analyze_results.py --doc <doc_id> --status non_compliant --category attributable
+
+# Filter by specific pages
+python <skill-path>/scripts/analyze_results.py --doc <doc_id> --pages 14,78,99
+
+# Filter by specific rule
+python <skill-path>/scripts/analyze_results.py --doc <doc_id> --rule ALC-ATT1
+
+# Combine filters — e.g. all uncertain on pages 10-30
+python <skill-path>/scripts/analyze_results.py --doc <doc_id> --status uncertain --pages 10-30
+
+# Summary table only, no detail
+python <skill-path>/scripts/analyze_results.py --doc <doc_id> --status non_compliant --summary-only
+```
+
+The script prints a summary table, per-evaluation detail (reasoning + evidence), and a
+`Failing pages per rule` JSON block at the end — use that JSON to drive validation in step 3.
+
+### 2. Analyze and propose changes
+
+Read the `reasoning` and `evidence` fields of the filtered evaluations and identify patterns:
+
+- **False positives** (rule fires but shouldn't): OCR artifact in reasoning? Skip condition
+  too narrow? `pass_criteria` missing an acceptable entry type?
+- **False negatives** (rule should fire but doesn't): `pass_criteria` too lenient? Missing
+  a required check?
+- **Uncertain** results: `pass_criteria` ambiguous? Conflicting text/vision verdicts?
+- **not_applicable over-firing**: `skip_conditions` too broad?
+
+Cross-reference the current rule config in `<framework>_rules.yaml` before proposing.
+
+Present proposed changes clearly and **wait for user confirmation** before editing any file.
+
+### 3. Validate after changes
+
+After the user confirms and you've edited the YAML, re-run validation against the exact
+pages that appeared in the filtered results. Use the `Failing pages per rule` JSON from
+step 1 to know which pages to target.
+
+**For `llm`, `vision`, `text_and_vision`, `text_primary` rules:**
+
+```bash
+cd backend/
+.venv/bin/python -m app.compliance.rules.validate_cli \
+  --agent <framework> --rule <number> \
+  --doc <doc_id> --pages <failing_pages> --expect pass
+```
+
+**For `agentic_audit` rules** — page numbers don't apply; run at doc level:
+
+```bash
+cd backend/
+.venv/bin/python scripts/run_<framework>_agentic_postpass.py <doc_id> --rule-number <N>
+```
+
+If validation still fails, iterate on `pass_criteria` and re-run until the pages that
+previously failed now pass.
+
+---
+
+## Evaluation strategy selection
+
+### Step 1 — Does the rule need vision at all?
+
+Ask: *"Could OCR text alone reliably catch this violation?"*
+
+- **Yes, text is sufficient** → use `llm` (default). No `visual_checks` needed.
+  - Examples: keyword presence, date format checks, missing field labels, page numbering.
+- **No, the violation is only detectable from the image** → use `vision`. Requires at least one `visual_checks` tag (see below).
+  - Examples: smudge/fading, sticky notes, correction fluid, ink color.
+- **Both channels add value** → use `text_and_vision` or `text_primary` (see Step 2).
+
+### Step 2 — If both channels are used, can vision override text?
+
+| Strategy | When text says non_compliant but vision disagrees | When text says compliant but vision disagrees | Choose when… |
+|---|---|---|---|
+| `text_and_vision` | **Vision wins (de-escalates)** | **Text wins** | Vision rescues OCR false positives; vision cannot add new violations |
+| `text_primary` | Text wins | Vision wins (escalates) | Vision is a catch-only safety net — text is authoritative, vision can only add violations |
+| `llm_arbitrated` | A third LLM call resolves the conflict | — | Verdicts frequently conflict AND both channels are unreliable alone; expensive — use sparingly |
+
+**`text_and_vision`** — vision can ONLY de-escalate. When text is more severe than vision, vision wins and lowers
+the verdict. When vision is more severe than text (vision non-compliant, text compliant), TEXT still wins — vision
+cannot introduce a new violation. Use when OCR false positives are your main risk.
+
+**`text_primary`** — vision can ONLY escalate. When vision is more severe than text, vision wins and raises
+the verdict. When text is more severe (text non-compliant, vision compliant), TEXT wins. Use when text is the
+authoritative source AND you also want vision to catch visual-only violations text cannot see.
+
+**`llm_arbitrated`** — adds a third LLM call that sees both text and vision verdicts and resolves disagreements.
+Only justify this when the two channels regularly produce conflicting verdicts on the same evidence AND you cannot
+determine which should win by rule (e.g., OCR confidence is too variable to make `text_primary` reliable, but
+the page also contains visual signals that text can't capture). It is the most expensive strategy; prefer
+`text_and_vision` or `text_primary` first.
+
+### Step 3 — If using vision, choose the right `visual_checks` tag(s)
+
+**Vision evaluation only works through declared `visual_checks` tags.** The pipeline sends each tag's
+domain-specific prompt to the VLM. Do not invent new tags — use only the supported list below.
+If no existing tag fits, fall back to `pass_criteria` text in the VLM prompt (strategy `vision` with no
+`visual_checks` will use a generic visual inspection prompt, which is less reliable).
+
+> **`pass_criteria` reaches both evaluators.** The text LLM and the VLM both receive `pass_criteria` as
+> part of their prompt. Write `pass_criteria` to describe what valid evidence looks like for the text
+> evaluator — OCR-aware guidance, valid entry formats, table layout notes. Do NOT add "Vision: …"
+> subsections; vision behaviour is controlled exclusively by `visual_checks` tags.
+
+**Supported `visual_checks` tags:**
+
+| Tag | What it detects |
+|-----|----------------|
+| `VC-STRIKE` | Correction methodology — single-line strikethrough vs. scribble/overwrite; checks initials+date adjacent |
+| `VC-SIGNATURE` | Signature field status — wet ink / typed / rubber stamp / empty; checks for accompanying date |
+| `VC-INK-COLOR` | Ink color compliance — blue/black (pass), pencil or non-permanent (fail) |
+| `VC-CORRECTION` | Prohibited corrections — white-out, erasure marks, overwriting, tape corrections |
+| `VC-STAMP-SEAL` | Stamps/seals/watermarks — ORIGINAL, CONTROLLED COPY, COPY, QA approval stamps |
+| `VC-ATTACHMENT` | Physical attachments — affixed labels/stickers, detached items, empty attachment spaces |
+| `VC-BARCODE` | Barcode/label readability — clear vs. smudged/damaged/obscured |
+| `VC-BLANK-FIELD` | Blank form fields — empty cells vs. filled / dash / N/A |
+| `VC-DOC-QUALITY` | Physical page quality — smudges, fading, water damage, tears, pencil marks |
+| `VC-CHART` | Chart/graph quality — axis labels, scale markings, units, data clarity |
+| `VC-CHROMATOGRAM` | Chromatogram integrity — peak labels, baseline, integration marks, printout metadata |
+| `VC-CHECKBOX` | Checkbox/tickmark status — checked / unchecked / N/A counts |
+| `VC-PAGINATION` | Page numbering — "Page X of Y" in header/footer, legibility |
+| `VC-STICKY-NOTE` | Temporary annotations — sticky notes, pencil annotations, draft watermarks |
+
+A rule can declare multiple tags; the VLM prompt is the union of all declared check descriptions.
+
+### Quick-reference: strategy × use case
+
+| Use case | Strategy | visual_checks |
+|----------|----------|---------------|
+| Text fields, dates, keywords | `llm` | none |
+| Signatures, handwritten entries | `text_and_vision` | `VC-SIGNATURE` |
+| Blank fields (OCR may miss empty cells) | `text_and_vision` | `VC-BLANK-FIELD` |
+| Correction method (strikethrough vs. scribble) | `text_and_vision` | `VC-STRIKE` |
+| Ink color | `vision` | `VC-INK-COLOR` |
+| White-out / erasure / overwrite | `text_and_vision` | `VC-CORRECTION` |
+| Page numbering (OCR splits "Page\n1 of 35") | `text_and_vision` | `VC-PAGINATION` |
+| Document smudging / physical damage | `vision` | `VC-DOC-QUALITY` |
+| Sticky notes / temporary annotations | `vision` | `VC-STICKY-NOTE` |
+| Barcodes / labels | `vision` | `VC-BARCODE` |
+| Charts / graphs | `vision` | `VC-CHART` |
+| Frequently conflicting text/vision with no clear winner | `llm_arbitrated` | as needed |
+
+---
+
+## Evaluation strategy merge behavior (reference)
+
+Merge logic is severity-based: non_compliant(4) > uncertain(3) > error(2) > compliant(1) > not_applicable(0).
+
+| Strategy | Tie (same severity) | Vision more severe | Text more severe |
+|---|---|---|---|
+| `text_and_vision` | **Text wins** | **Text wins** | Vision wins (de-escalates) |
+| `text_primary` | **Text wins** | Vision wins (escalates) | **Text wins** |
+| `vision` | Vision only | — | — |
+| `llm_arbitrated` | Agreed verdict returned directly; LLM arbitrates conflicts | | |
+
+Key insight for `text_and_vision`: vision can **only** lower a verdict, never raise it. If text says compliant
+and vision says non-compliant, text still wins. This makes it safe to pair with any `VC-*` tag — a noisy
+vision check cannot introduce a false positive when text has already found the evidence of compliance.
+
+---
+
+## Pre-flight checklist (both paths)
+
+- [ ] Rule text in `.md` matches YAML entries (same numbers, same count)
+- [ ] Document and section types use canonical values from `document_profiles.yaml`
+- [ ] `pass_criteria` is explicit and testable (see playbook for what "explicit" means)
+- [ ] No "Vision: …" subsections in `pass_criteria` — vision is controlled by `visual_checks` tags only
+- [ ] `skip_conditions` represent true non-applicability, not failure
+- [ ] External dependencies use `cannot_evaluate` with reason + data list
+- [ ] **Page type filter checked**: if the category sets `applicable_page_types: [form]`, verify the target
+  section's pages are actually classified as `form`. If not (e.g., `content`), override with
+  `applicable_page_types: []` at the rule level. Use `applicability_trace` in `compliance_result.json`
+  to diagnose — a `page_type: fail` trace entry is the symptom.
+- [ ] Validation passed on at least one positive and one negative sample:
+  - `llm`/`vision`/`text_and_vision`/`text_primary` → `validate_cli`
+  - `agentic_audit` → postpass script (see `references/agentic_audit_guide.md`)
+- [ ] Config validator passes (or expected warnings are explained)
+
+### Diagnosing "rule not firing" with applicability_trace
+
+When a rule returns `not_applicable` unexpectedly, read its `applicability_trace` in
+`compliance_result.json` (under `agent_reports[].all_evaluations[]`). Each entry explains exactly
+which filter caused the skip:
+
+- `section_type: fail` → rule's `applicable_section_types` doesn't match the page's section
+- `page_type: fail` → **inherited** `applicable_page_types` from the category is blocking it;
+  override the rule with `applicable_page_types: []`
+- `document_type: fail` → document type mismatch
+
+---
+
+## References
+
+| File | When to read |
+|------|-------------|
+| `references/rule_authoring_playbook.md` | Every session — authoring conventions, field semantics, anti-patterns |
+| `references/agentic_audit_guide.md` | When authoring or debugging any `agentic_audit` rule — target/context design, pass_criteria patterns, postpass validation |
diff --git a/.claude/skills/audit-rule-author/evals/evals.json b/.claude/skills/audit-rule-author/evals/evals.json
new file mode 100644
index 0000000..8317dbe
--- /dev/null
+++ b/.claude/skills/audit-rule-author/evals/evals.json
@@ -0,0 +1,26 @@
+{
+  "skill_name": "audit-rule-author",
+  "evals": [
+    {
+      "id": 0,
+      "name": "update-existing-rule",
+      "prompt": "Rule 27 in the ALCOA framework keeps flagging pages as non-compliant because OCR is misreading handwritten times — things like '14:30' being read as '14;30' or '1430'. Can you update the pass criteria to handle these OCR artifacts properly?",
+      "expected_output": "Updated pass_criteria for rule 27 in alcoa_rules.yaml that explicitly handles OCR time-reading artifacts, validator passes.",
+      "files": []
+    },
+    {
+      "id": 1,
+      "name": "add-to-gmp",
+      "prompt": "I need to add a rule to the GMP framework: equipment calibration due dates must be recorded alongside each equipment usage entry in manufacturing operations. If the calibration is expired or not recorded, that's a major finding.",
+      "expected_output": "New rule added to gmp_rules.md with rule text and gmp_rules.yaml with proper YAML config (applicable_section_types, pass_criteria, skip_conditions, severity), validator passes.",
+      "files": []
+    },
+    {
+      "id": 2,
+      "name": "new-framework",
+      "prompt": "We need a brand-new audit framework called 'lab_notebook' for laboratory notebooks. It should cover three things: (1) raw data entries are not altered or overwritten, (2) each entry has a date and the researcher's signature, (3) any corrections use proper strikethrough with initials and date. Set severity to major for all rules.",
+      "expected_output": "agents_meta.json updated with lab_notebook entry, lab_notebook_rules.md created with 3 rules, lab_notebook_rules.yaml created with proper config for all 3 rules.",
+      "files": []
+    }
+  ]
+}
diff --git a/.claude/skills/audit-rule-author/references/agentic_audit_guide.md b/.claude/skills/audit-rule-author/references/agentic_audit_guide.md
new file mode 100644
index 0000000..e93d673
--- /dev/null
+++ b/.claude/skills/audit-rule-author/references/agentic_audit_guide.md
@@ -0,0 +1,257 @@
+# Agentic Audit Rules — Authoring Guide
+
+Rules with `evaluation_strategy: agentic_audit` run a multi-step LLM agent rather than a
+single-pass evaluator. The agent receives: (a) the primary section's raw OCR pages and
+(b) pre-loaded raw OCR pages from `context_sources`. It reasons across both before
+producing a verdict.
+
+Read this guide whenever you are **authoring or debugging** an `agentic_audit` rule.
+
+---
+
+## When to choose agentic_audit vs llm
+
+| Situation | Strategy |
+|-----------|----------|
+| Single-page check against the page itself | `llm` |
+| Cross-reference between two different sections | `agentic_audit` |
+| Each row in a table must be verified against another section | `agentic_audit` |
+| Simple presence/format/signature check | `llm` |
+
+`agentic_audit` is slower and more expensive. Use it only when the compliance check
+genuinely requires cross-section evidence that cannot be encoded in a single `pass_criteria`.
+
+---
+
+## Designing target vs context
+
+The agent evaluates the **target section** (set by `applicable_section_types`) and uses
+the **context section** (set by `context_sources`) as reference material.
+
+**Key principle: the more structured, enumerable section should be the target.**
+
+The agent works best when it can iterate row-by-row over the target. The context can be
+large and unstructured — it is pre-loaded as raw OCR pages and the agent reads what it needs.
+
+**Example — weighing sheet vs manufacturing ops:**
+
+| Role | Section | Why |
+|------|---------|-----|
+| Target | `material_dispensing` (2 pages, structured table) | One row per material/step — easy to enumerate |
+| Context | `manufacturing_operations` (26 pages, step-by-step) | Agent looks up steps on demand |
+
+The agent checks each dispensing row then looks up its step number in the context.
+This is far easier than scanning 26 manufacturing pages for charging steps and
+cross-referencing backwards to the weighing sheet.
+
+**Anti-pattern:** Don't make a large, unstructured section the target when a
+smaller structured counterpart exists. The agent will miss items.
+
+---
+
+## YAML configuration
+
+```yaml
+rules:
+  1:
+    evaluation_strategy: agentic_audit
+    applicable_document_types: [batch_record]
+    applicable_section_types: [material_dispensing]    # target — evaluated directly
+    context_sources:
+      - document_type: batch_record
+        section_types: [manufacturing_operations]      # context — loaded as raw pages
+```
+
+`context_sources` is a list. Each entry has:
+- `document_type`: the package document type to pull from
+- `section_types`: list of section types to include; `[]` means all sections of that doc type
+
+The pipeline pre-loads all matching pages as raw OCR markdown **before** the agent starts.
+The agent sees the full content without calling any retrieval tools.
+
+---
+
+## Writing pass_criteria for agentic_audit
+
+### Recommended template structure
+
+```
+TARGET: [Describe the target section's layout — columns, row structure, what to look at.
+         Include the signature image note if signatures appear in the target.]
+
+CONTEXT: [Describe what the context section contains and how to use it.
+          Include the signature image note if signatures appear in the context.]
+
+[ROW STRUCTURE NOTE — only if the target has summary-header + sub-row pattern; see below]
+
+WHAT TO CHECK: [Step-by-step evaluation instructions. Enumerate items one by one.]
+
+SKIP: [Conditions where an item is skipped rather than failed.]
+
+VERDICT:
+- COMPLIANT if ...
+- NON-COMPLIANT if ... (cite specific item, row, or step number)
+- NOT_APPLICABLE only if [threshold — usually "no evaluable items exist"]
+```
+
+---
+
+### Pattern 1 — Signature images in OCR
+
+Both target and context pages represent handwritten signatures as `<img .../>` tags.
+The agent will treat them as empty cells unless you tell it otherwise.
+
+**Always include this note in `pass_criteria` wherever signatures appear:**
+
+> Done by / Checked by cells often contain embedded signature images (shown as `<img .../>`)
+> rather than plain text — treat any non-empty cell (image, text, or date alone) as
+> "executed/signed". Only blank cells or cells containing solely dashes (`—`, `-`) should
+> be treated as unsigned/not executed.
+
+This applies to both target pages and context pages. If signatures appear in the
+context (e.g., manufacturing ops), add the note in the CONTEXT block of your criteria.
+
+---
+
+### Pattern 2 — NOT_APPLICABLE threshold
+
+A common mistake is letting the agent mark a whole page NOT_APPLICABLE because it found
+**some** excluded items, even though other evaluable items exist on the same page.
+
+Always be explicit:
+
+> NOT_APPLICABLE only if EVERY item on this page qualifies as a skip condition.
+> Even a single evaluable item means the page must return COMPLIANT or NON_COMPLIANT.
+
+---
+
+### Pattern 3 — Summary header vs step sub-rows
+
+Some pharmaceutical tables use a two-level row structure:
+- **Summary header row**: all step numbers aggregated, total quantity, final sign-off
+- **Individual step sub-rows** (indented below): one row per step, step-specific qty and sign-off
+
+When evaluating step-by-step, the **sub-rows are authoritative**. The summary header is a
+roll-up sign-off, not a per-step record. If the summary header and a sub-row conflict
+(different step numbers, different "Done by" status), trust the sub-row.
+
+Add this to `pass_criteria` when the target section has this structure:
+
+> ROW STRUCTURE: Each material has a summary header row (all steps aggregated, total qty,
+> final sign-off) and individual step sub-rows (one per step, step-specific qty and sign-off).
+> When sub-rows exist, use the sub-row's Done by / Checked by to determine whether material
+> was dispensed for THAT step — do not use the summary header's signature to infer
+> per-step dispensing status.
+> SKIP sub-rows where Done by AND Checked by are both blank/dashes — those represent
+> conditional steps not executed (e.g., a re-wash step skipped because the pH test passed).
+> If a material has NO sub-rows, use the header row's Done by / Checked by.
+
+---
+
+### Pattern 4 — Exclusions need counter-examples
+
+When a rule excludes certain step types (e.g., "skip transfer steps"), the agent will
+often over-apply the exclusion to adjacent steps. Ground the exclusion with a concrete
+example AND a counter-example:
+
+> EXCLUDED: Transfer of a pre-prepared intermediate solution between vessels, e.g.
+> "Add sodium hydroxide solution from addition tank ATE013" — the original raw materials
+> were already charged at an earlier step.
+> IMPORTANT: This exclusion applies ONLY when the material being added is a named solution
+> whose constituents were charged and weighed at an earlier numbered step in the same batch
+> record. It does NOT apply to charging virgin raw materials directly from storage.
+> Counter-example: "dissolving 35.39 Kg of Caustic Soda Flakes in 354 L of Purified Water
+> in reactor SRE024" IS a charging step, even though the NaOH solution is later transferred.
+
+---
+
+## Validation
+
+**Do not use `validate_cli` for `agentic_audit` rules.** The validate_cli runs a single-pass
+LLM evaluator and ignores `context_sources` entirely — it will produce misleading results.
+
+Use the agentic postpass script instead:
+
+```bash
+cd backend/
+
+# Checklist framework:
+.venv/bin/python scripts/run_checklist_agentic_postpass.py <doc_id> --rule-number <N>
+
+# Add --verbose-debug to print full JSON snapshots of each agent step to stderr:
+.venv/bin/python scripts/run_checklist_agentic_postpass.py <doc_id> --rule-number <N> --verbose-debug
+```
+
+If your framework does not have a postpass script yet, check `backend/scripts/` — the
+script is framework-specific because it imports the agent's `AGENT_NAME` constant.
+Creating a postpass script for a new framework is straightforward: copy
+`run_checklist_agentic_postpass.py` and replace the `AGENT_NAME` import.
+
+The script outputs JSON. Key fields:
+- `status`: `compliant` / `non_compliant` / `not_applicable` / `uncertain`
+- `confidence`: 0.0–1.0
+- `reasoning`: the agent's step-by-step logic — **read this first when debugging**
+- `evidence`: specific data cited
+
+---
+
+## Debugging a wrong verdict
+
+### Step 1 — Peek both sections first
+
+Before editing `pass_criteria`, confirm the data is actually in the OCR:
+
+```bash
+# Target section
+python <skill-path>/scripts/peek_pages.py --doc <doc_id> --pages <target_pages>
+
+# Context section
+python <skill-path>/scripts/peek_pages.py --doc <doc_id> --pages <context_pages>
+```
+
+If the data isn't in the OCR, the rule cannot fix a bad verdict — it's a classification
+or OCR issue.
+
+### Step 2 — Read the reasoning field
+
+The `reasoning` field in the JSON output almost always tells you exactly what went wrong.
+Common patterns and their fixes:
+
+| Reasoning says | Root cause | Fix |
+|----------------|------------|-----|
+| "lack of explicit details" / "insufficient evidence" | Agent can't find the data in context | Add clearer pointers in the CONTEXT block; add concrete examples |
+| "step X not executed" but OCR shows a signature | Signature image guidance missing | Add `<img .../>` = signed note to CONTEXT block |
+| Page marked NOT_APPLICABLE despite evaluable items | NOT_APPLICABLE threshold too broad | Add "only if EVERY item is excluded" language |
+| Wrong material or step number cited | Summary header confused with sub-rows | Add row structure explanation |
+| Exclusion applied to a step that should be checked | Exclusion boundary not grounded | Add counter-example showing what is NOT excluded |
+
+### Step 3 — Run --verbose-debug
+
+```bash
+.venv/bin/python scripts/run_checklist_agentic_postpass.py <doc_id> --rule-number <N> --verbose-debug 2>&1
+```
+
+This prints each agent action (context preload, tool calls, verdict) to stderr as NDJSON.
+Look for the `preload_context` event to confirm how many characters of context loaded,
+and the `verdict` event to see the raw reasoning before synthesis.
+
+### Step 4 — Iterate
+
+Edit `pass_criteria`, re-run the postpass script, check again. There is some LLM variance;
+if a fix seems logically correct but the result is still wrong, run 2–3 times before
+concluding it didn't work.
+
+---
+
+## Pre-flight checklist for agentic_audit rules
+
+In addition to the standard pre-flight in SKILL.md:
+
+- [ ] Target section is the more structured/smaller section
+- [ ] Context section is correctly listed in `context_sources` with the right `section_types`
+- [ ] `pass_criteria` describes both TARGET and CONTEXT blocks
+- [ ] Signature image note (`<img .../>` = signed) present for both target and context if needed
+- [ ] NOT_APPLICABLE threshold says "only if EVERY item is excluded"
+- [ ] Multi-row table structure (summary header vs sub-rows) explained if applicable
+- [ ] Exclusions have at least one concrete counter-example
+- [ ] Validated with the postpass script (not validate_cli) — positive and negative samples
diff --git a/.claude/skills/audit-rule-author/references/rule_authoring_playbook.md b/.claude/skills/audit-rule-author/references/rule_authoring_playbook.md
new file mode 100644
index 0000000..b4bb846
--- /dev/null
+++ b/.claude/skills/audit-rule-author/references/rule_authoring_playbook.md
@@ -0,0 +1,192 @@
+# Compliance Rule Authoring Playbook
+
+This guide is for domain owners and QA reviewers writing or updating compliance rules without changing backend code.
+
+## Goals
+
+- Keep rules deterministic and explainable.
+- Minimize false positives from broad applicability.
+- Make rule intent maintainable by non-authors.
+- Preserve stable scoring across runs.
+
+## Where to edit
+
+- Rule text: `backend/app/compliance/rules/*_rules.md`
+- Rule behavior/config: `backend/app/compliance/rules/*_rules.yaml`
+- Document profile + section taxonomy: `backend/app/compliance/rules/document_profiles.yaml`
+
+Rule behavior must live in YAML, not in comments.
+
+## Authoring model
+
+Each rule has two parts:
+
+1. Human-readable statement (`.md`)
+2. Machine-executable metadata (`.yaml`)
+
+Minimum YAML fields to set for every new or changed rule:
+
+- `applicable_document_types`
+- `applicable_section_types`
+- `applicable_page_types` (if needed)
+- `pass_criteria`
+- `skip_conditions`
+
+Optional advanced fields:
+
+- `evaluation_strategy`: controls how the rule is evaluated. Values:
+  - `llm` (default): single-pass LLM against the page OCR text
+  - `vision`: VLM image pass only (requires `visual_checks`)
+  - `text_and_vision`: both OCR text (LLM) and image (VLM). Vision can **only de-escalate** — if text
+    says non_compliant and vision says compliant, vision wins; if text says compliant and vision says
+    non_compliant, text still wins. Use when OCR false positives are the main risk.
+  - `text_primary`: both channels, but vision can **only escalate** — text is authoritative unless
+    vision finds a worse violation. Use when you want vision as an additional catch-only safety net.
+  - `agentic_audit`: multi-step agent with cross-section context — see `agentic_audit_guide.md`
+- `context_sources`: list of `{document_type, section_types}` — only used with `agentic_audit`.
+  Specifies which sections are pre-loaded as raw OCR pages for the agent to cross-reference.
+- `evaluation_mode: cannot_evaluate`
+- `cannot_evaluate_reason`
+- `requires_external_data`
+- `cross_section_requirements`
+- `keywords` (only if helpful; avoid over-restricting)
+
+## Section and document scoping
+
+Use canonical values defined in `document_profiles.yaml`.
+
+Examples:
+
+- Document types: `batch_record`, `sop`, `logbook`, `certificate`
+- Section types: `manufacturing_operations`, `material_dispensing`, `qc_report`, `line_clearance`
+
+If a section name appears differently in documents, add alias mapping in `document_profiles.yaml` instead of inventing a new section type in a rule.
+
+## Writing good `pass_criteria`
+
+Use explicit, testable language:
+
+- Good: "Any non-empty text in `Done by`/`Checked by` columns counts as signed."
+- Bad: "Looks properly signed."
+
+Include OCR-aware instructions where relevant:
+
+- Garbled handwritten text may still be valid signature evidence.
+- Dash values (`-`, `---`, `—`) may mean not applicable.
+- OCR year artifacts should not be treated as hard data-integrity failures without context.
+
+## Writing good `skip_conditions`
+
+Skip should represent true non-applicability, not failure.
+
+- Good: "Page has no checklist items -> not_applicable"
+- Bad: "No checklist found -> non_compliant"
+
+Prefer concrete conditions tied to page structure/content.
+
+## When to use `cannot_evaluate`
+
+Use this for rules that require data outside the packet:
+
+- Training records
+- Signature logs
+- Calibration systems
+- Archive/IT systems
+
+Set all of:
+
+- `evaluation_mode: cannot_evaluate`
+- `cannot_evaluate_reason`
+- `requires_external_data`
+
+## Cross-section rules
+
+For rules comparing two sections, set:
+
+- `scope: document` or `scope: section`
+- `cross_section_requirements` (from deterministic resolver)
+
+Current requirement IDs include:
+
+- `operation_vs_weighing_reconciliation`
+- `material_usage_vs_dispensing`
+- `sample_sent_vs_qc_report`
+- `qc_vs_coa_consistency`
+- `inter_section_consistency`
+
+## Anti-patterns to avoid
+
+- Broad rules with no document/section scope.
+- Over-reliance on generic keywords as primary applicability logic.
+- Encoding executable logic in comments only.
+- Treating OCR artifacts as compliance failures by default.
+- Mixing pass/fail criteria with business process assumptions not present in evidence.
+- Writing "Vision: …" guidance in `pass_criteria` — vision behaviour is controlled by `visual_checks`
+  tags, not prose in `pass_criteria`. Keep `pass_criteria` focused on what the text LLM should look for.
+- Forgetting that category-level `applicable_page_types` is inherited by every rule in that category.
+  If the target section's pages are classified as `content` (not `form`), they will be silently skipped.
+  Override with `applicable_page_types: []` at the rule level and verify with `applicability_trace`.
+
+## Change checklist
+
+Before marking a rule update complete:
+
+1. Rule text and YAML are both updated.
+2. Document and section scopes are set.
+3. `pass_criteria` and `skip_conditions` are explicit.
+4. Any external dependency is marked `cannot_evaluate`.
+5. Config validator passes.
+
+Validator command:
+
+```bash
+backend/.venv/bin/python - <<'PY'
+from app.compliance.rules.registry import get_registry
+from app.compliance.rules.profiles import validate_compliance_configs
+validate_compliance_configs(get_registry())
+print("OK")
+PY
+```
+
+## Quick templates
+
+### Standard page rule
+
+```yaml
+12:
+  applicable_document_types: [batch_record]
+  applicable_section_types: [manufacturing_operations]
+  applicable_page_types: [form]
+  pass_criteria: >
+    <explicit condition for compliance>
+  skip_conditions:
+    - "Page has no <required structure> -> not_applicable"
+```
+
+### External dependency rule
+
+```yaml
+21:
+  evaluation_mode: cannot_evaluate
+  cannot_evaluate_reason: "Requires calibration system records"
+  requires_external_data: [calibration_records]
+```
+
+### Cross-section rule
+
+```yaml
+7:
+  scope: document
+  cross_section_requirements:
+    - sample_sent_vs_qc_report
+  pass_criteria: >
+    <explicit comparison expectation>
+```
+
+## Ownership recommendation
+
+- Domain owner: rule text + acceptance semantics.
+- QA/compliance lead: severity + process correctness.
+- Engineering owner: schema compliance + deterministic fit.
+
+This keeps SoC clean while preserving fast iteration.
diff --git a/.claude/skills/audit-rule-author/scripts/analyze_results.py b/.claude/skills/audit-rule-author/scripts/analyze_results.py
new file mode 100644
index 0000000..4d9f614
--- /dev/null
+++ b/.claude/skills/audit-rule-author/scripts/analyze_results.py
@@ -0,0 +1,203 @@
+#!/usr/bin/env python3
+"""
+analyze_results.py — Filter and inspect compliance evaluation results.
+
+Usage (run from project root or backend/):
+    # Show all non_compliant in a category
+    python <skill-path>/scripts/analyze_results.py --doc <doc_id> --status non_compliant --category attributable
+
+    # Show all evaluations for specific pages
+    python <skill-path>/scripts/analyze_results.py --doc <doc_id> --pages 14,78
+
+    # Show all uncertain evaluations across all categories
+    python <skill-path>/scripts/analyze_results.py --doc <doc_id> --status uncertain
+
+    # Filter by rule ID
+    python <skill-path>/scripts/analyze_results.py --doc <doc_id> --rule ALC-ATT1
+
+    # Combine filters
+    python <skill-path>/scripts/analyze_results.py --doc <doc_id> --status non_compliant --pages 14,30,78
+
+    # Show summary only (no per-evaluation detail)
+    python <skill-path>/scripts/analyze_results.py --doc <doc_id> --status non_compliant --summary-only
+
+    # List available categories and status counts
+    python <skill-path>/scripts/analyze_results.py --doc <doc_id> --list
+"""
+
+import argparse
+import json
+import os
+import sys
+from pathlib import Path
+
+
+def find_data_root() -> Path:
+    """Documents storage root: AT_STORAGE__BASE_PATH (the pipeline's own var),
+    else backend/data/documents / data/documents relative to cwd."""
+    candidates = []
+    env = os.environ.get("AT_STORAGE__BASE_PATH")
+    if env:
+        candidates.append(Path(env))
+    candidates += [Path("backend/data/documents"), Path("data/documents")]
+    for candidate in candidates:
+        if candidate.is_dir():
+            return candidate
+    sys.exit(
+        "ERROR: Cannot find the documents storage root. Set AT_STORAGE__BASE_PATH "
+        "or run from the repo root / backend/."
+    )
+
+
+def load_results(doc_id: str) -> dict:
+    root = find_data_root()
+    path = root / doc_id / "compliance_result.json"
+    if not path.exists():
+        sys.exit(f"ERROR: compliance_result.json not found at {path}")
+    with open(path) as f:
+        return json.load(f)
+
+
+def gather_evaluations(data: dict) -> list[dict]:
+    """Collect all_evaluations from every agent report, annotating with agent name."""
+    evals = []
+    for agent_report in data.get("agent_reports", []):
+        agent = agent_report.get("agent", "unknown")
+        for ev in agent_report.get("all_evaluations", []):
+            ev = dict(ev)
+            ev.setdefault("agent", agent)
+            evals.append(ev)
+    return evals
+
+
+def parse_pages(pages_str: str) -> set[int]:
+    pages = set()
+    for part in pages_str.split(","):
+        part = part.strip()
+        if "-" in part:
+            start, end = part.split("-", 1)
+            pages.update(range(int(start), int(end) + 1))
+        else:
+            pages.add(int(part))
+    return pages
+
+
+def format_evaluation(ev: dict, idx: int) -> str:
+    lines = [
+        f"\n{'='*60}",
+        f"[{idx}] {ev.get('rule_id', '?')} — {ev.get('rule_text', '')}",
+        f"    Agent    : {ev.get('agent', '?')}",
+        f"    Category : {ev.get('rule_category', '?')}",
+        f"    Status   : {ev.get('status', '?')}  (confidence: {ev.get('confidence', '?')})",
+        f"    Pages    : {ev.get('page_numbers', [])}",
+    ]
+    if ev.get("reasoning"):
+        lines.append(f"    Reasoning: {ev['reasoning'][:300]}{'...' if len(ev.get('reasoning','')) > 300 else ''}")
+    if ev.get("evidence"):
+        lines.append(f"    Evidence : {ev['evidence'][:200]}{'...' if len(ev.get('evidence','')) > 200 else ''}")
+    if ev.get("applicability_trace"):
+        lines.append(f"    Trace    : {ev['applicability_trace']}")
+    return "\n".join(lines)
+
+
+def main():
+    parser = argparse.ArgumentParser(description="Filter compliance evaluation results")
+    parser.add_argument("--doc", required=True, help="Document ID")
+    parser.add_argument("--status", help="Filter by status: compliant|non_compliant|not_applicable|uncertain|error")
+    parser.add_argument("--category", help="Filter by rule category slug (e.g. attributable, legible)")
+    parser.add_argument("--pages", help="Filter by page numbers (e.g. 14,78 or 10-20)")
+    parser.add_argument("--rule", help="Filter by rule ID (e.g. ALC-ATT1)")
+    parser.add_argument("--agent", help="Filter by agent name (e.g. alcoa, checklist)")
+    parser.add_argument("--summary-only", action="store_true", help="Show counts only, no detail")
+    parser.add_argument("--list", action="store_true", help="List categories and status counts, then exit")
+    args = parser.parse_args()
+
+    data = load_results(args.doc)
+    all_evals = gather_evaluations(data)
+
+    print(f"\nDocument : {data.get('filename', args.doc)}")
+    print(f"Doc ID   : {args.doc}")
+    print(f"Pages    : {data.get('total_pages', '?')}")
+    print(f"Agents   : {[r['agent'] for r in data.get('agent_reports', [])]}")
+    print(f"Total evaluations loaded: {len(all_evals)}")
+
+    if args.list:
+        from collections import Counter
+        cat_counts: dict[str, Counter] = {}
+        for ev in all_evals:
+            cat = ev.get("rule_category", "unknown")
+            status = ev.get("status", "unknown")
+            if cat not in cat_counts:
+                cat_counts[cat] = Counter()
+            cat_counts[cat][status] += 1
+        print("\n--- Categories and status counts ---")
+        for cat in sorted(cat_counts):
+            counts = cat_counts[cat]
+            total = sum(counts.values())
+            print(f"\n  {cat} ({total} rules)")
+            for status in ["non_compliant", "uncertain", "compliant", "not_applicable", "error"]:
+                if counts[status]:
+                    print(f"    {status}: {counts[status]}")
+        return
+
+    # Apply filters
+    filtered = all_evals
+
+    if args.agent:
+        filtered = [e for e in filtered if e.get("agent") == args.agent]
+
+    if args.status:
+        filtered = [e for e in filtered if e.get("status") == args.status]
+
+    if args.category:
+        cat_lower = args.category.lower()
+        filtered = [e for e in filtered if (e.get("rule_category") or "").lower() == cat_lower]
+
+    if args.rule:
+        filtered = [e for e in filtered if e.get("rule_id") == args.rule]
+
+    if args.pages:
+        page_set = parse_pages(args.pages)
+        filtered = [e for e in filtered if any(p in page_set for p in (e.get("page_numbers") or []))]
+
+    print(f"\nFiltered : {len(filtered)} evaluation(s)")
+
+    if not filtered:
+        print("No evaluations match the given filters.")
+        return
+
+    # Summary table
+    print("\n--- Summary ---")
+    print(f"{'Rule ID':<20} {'Category':<18} {'Status':<16} {'Conf':<6} {'Pages'}")
+    print("-" * 80)
+    for ev in filtered:
+        pages_str = str(ev.get("page_numbers", []))
+        conf = ev.get("confidence", 0)
+        print(f"{ev.get('rule_id','?'):<20} {ev.get('rule_category','?'):<18} {ev.get('status','?'):<16} {conf:<6.2f} {pages_str}")
+
+    if args.summary_only:
+        return
+
+    # Detailed output
+    print("\n--- Detail ---")
+    for i, ev in enumerate(filtered, 1):
+        print(format_evaluation(ev, i))
+
+    # Emit machine-readable footer for skill use
+    failing_pages: dict[str, list[int]] = {}
+    for ev in filtered:
+        rule_id = ev.get("rule_id", "")
+        pages = ev.get("page_numbers") or []
+        if pages:
+            existing = failing_pages.get(rule_id, [])
+            for p in pages:
+                if p not in existing:
+                    existing.append(p)
+            failing_pages[rule_id] = existing
+
+    print("\n--- Failing pages per rule (for validation) ---")
+    print(json.dumps(failing_pages, indent=2))
+
+
+if __name__ == "__main__":
+    main()
diff --git a/.claude/skills/audit-rule-author/scripts/peek_pages.py b/.claude/skills/audit-rule-author/scripts/peek_pages.py
new file mode 100644
index 0000000..8434b89
--- /dev/null
+++ b/.claude/skills/audit-rule-author/scripts/peek_pages.py
@@ -0,0 +1,209 @@
+#!/usr/bin/env python3
+"""
+peek_pages.py — Inspect OCR markdown content and section metadata for document pages.
+
+Usage (run from project root or backend/):
+    python <skill-path>/scripts/peek_pages.py --doc <doc_id> --pages 33,34
+    python <skill-path>/scripts/peek_pages.py --doc <doc_id> --pages 33-35
+    python <skill-path>/scripts/peek_pages.py --doc <doc_id> --pages 33 --no-markdown
+    python <skill-path>/scripts/peek_pages.py --doc <doc_id> --list-sections
+
+Data is read from the document storage root (``AT_STORAGE__BASE_PATH``, or
+``backend/data/documents`` when run from the repo root)/<doc_id>/:
+  - result.json        → raw OCR markdown per page
+  - segmentation.json  → page-range sections (section_type, name, document_type)
+"""
+
+import argparse
+import json
+import os
+import sys
+from pathlib import Path
+
+
+def find_data_root(explicit: str | None = None) -> Path:
+    """Locate the document storage root that contains the <doc_id> folders.
+
+    Resolution order: explicit ``--data-root`` → ``AT_STORAGE__BASE_PATH`` (the
+    same var the pipeline/API use) → ``backend/data/documents`` / ``data/documents``
+    relative to cwd.
+    """
+    candidates = []
+    if explicit:
+        candidates.append(Path(explicit))
+    env = os.environ.get("AT_STORAGE__BASE_PATH")
+    if env:
+        candidates.append(Path(env))
+    candidates += [Path("backend/data/documents"), Path("data/documents")]
+    for candidate in candidates:
+        if candidate.is_dir():
+            return candidate
+    sys.exit(
+        "ERROR: Cannot find the documents storage root. Pass --data-root, set "
+        "AT_STORAGE__BASE_PATH, or run from the repo root / backend/."
+    )
+
+
+def parse_pages(spec: str) -> list[int]:
+    """Parse '33', '33,34', or '33-35' into a sorted list of ints."""
+    pages = []
+    for part in spec.split(","):
+        part = part.strip()
+        if "-" in part:
+            lo, hi = part.split("-", 1)
+            pages.extend(range(int(lo), int(hi) + 1))
+        else:
+            pages.append(int(part))
+    return sorted(set(pages))
+
+
+def load_markdown(doc_dir: Path) -> dict[str, str]:
+    result_path = doc_dir / "result.json"
+    if not result_path.exists():
+        sys.exit(f"ERROR: result.json not found at {result_path}")
+    with result_path.open() as f:
+        data = json.load(f)
+    raw = data.get("raw_markdown", {})
+    if not isinstance(raw, dict):
+        sys.exit("ERROR: raw_markdown in result.json is not a dict — unexpected format")
+    return raw  # keys are string page numbers
+
+
+def load_classification(doc_dir: Path) -> dict[int, dict]:
+    """Return per-page section metadata keyed by page number.
+
+    Primary source on main is ``segmentation.json`` (a ``DocumentSegmentation``
+    with page-range ``sections``); each section's ``start_page..end_page`` is
+    expanded to per-page rows. Falls back to a legacy per-page
+    ``classification.yaml`` if present (older pipeline output).
+    """
+    seg_path = doc_dir / "segmentation.json"
+    if seg_path.exists():
+        with seg_path.open() as f:
+            data = json.load(f)
+        out: dict[int, dict] = {}
+        for sec in data.get("sections", []):
+            start, end = sec.get("start_page", 0), sec.get("end_page", 0)
+            if not start or not end:
+                continue
+            for pn in range(start, end + 1):
+                out[pn] = {
+                    "section_type_id": sec.get("section_type", "unknown"),
+                    "section_name": sec.get("name", ""),
+                    "page_role": sec.get("document_type", "—"),
+                    "detection_notes": sec.get("description", ""),
+                }
+        return out
+
+    # Legacy fallback: per-page classification.yaml (older pipeline output)
+    cls_path = doc_dir / "classification.yaml"
+    if not cls_path.exists():
+        return {}
+    try:
+        import yaml
+    except ImportError:
+        return {}
+    with cls_path.open() as f:
+        data = yaml.safe_load(f)
+    pages = data.get("pages", [])
+    return {p["page_number"]: p for p in pages if "page_number" in p}
+
+
+def print_separator(label: str = "") -> None:
+    line = "─" * 70
+    if label:
+        print(f"\n{line}")
+        print(f"  {label}")
+        print(line)
+    else:
+        print(line)
+
+
+def list_sections(classification: dict[int, dict]) -> None:
+    if not classification:
+        print("No classification.yaml found.")
+        return
+    # Group consecutive pages by section
+    current_section = None
+    section_start = None
+    rows = []
+    for pn in sorted(classification):
+        meta = classification[pn]
+        sec = meta.get("section_type_id", "unknown")
+        if sec != current_section:
+            if current_section is not None:
+                rows.append((section_start, pn - 1, current_section,
+                             classification[section_start].get("section_name", "")))
+            current_section = sec
+            section_start = pn
+    if current_section is not None:
+        last = max(classification)
+        rows.append((section_start, last, current_section,
+                     classification[section_start].get("section_name", "")))
+
+    print(f"\n{'PAGE RANGE':<16} {'SECTION TYPE':<35} {'DISPLAY NAME'}")
+    print("─" * 80)
+    for start, end, sec_type, display in rows:
+        rng = f"{start}" if start == end else f"{start}–{end}"
+        print(f"  {rng:<14} {sec_type:<35} {display}")
+
+
+def show_pages(pages: list[int], markdown: dict, classification: dict, show_md: bool) -> None:
+    for pn in pages:
+        key = str(pn)
+        md = markdown.get(key)
+        meta = classification.get(pn, {})
+
+        print_separator(f"PAGE {pn}")
+
+        # Section metadata
+        if meta:
+            print(f"  Section type : {meta.get('section_type_id', '—')}")
+            print(f"  Section name : {meta.get('section_name', '—')}")
+            print(f"  Page role    : {meta.get('page_role', '—')}")
+            notes = meta.get("detection_notes", "")
+            if notes:
+                print(f"  Notes        : {notes}")
+        else:
+            print("  (No classification metadata for this page)")
+
+        if show_md:
+            print()
+            if md:
+                print(md)
+            else:
+                print(f"  [No markdown content found for page {pn}]")
+
+    print_separator()
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Inspect OCR markdown content for document pages")
+    parser.add_argument("--doc", required=True, help="Document ID (folder name under backend/data/documents/)")
+    parser.add_argument("--pages", help="Pages to inspect: '33', '33,34', or '33-35'")
+    parser.add_argument("--list-sections", action="store_true", help="List all sections with page ranges")
+    parser.add_argument("--no-markdown", action="store_true", help="Show section metadata only, skip markdown content")
+    parser.add_argument("--data-root", help="Documents storage root (defaults to AT_STORAGE__BASE_PATH or backend/data/documents)")
+    args = parser.parse_args()
+
+    data_root = find_data_root(args.data_root)
+    doc_dir = data_root / args.doc
+    if not doc_dir.is_dir():
+        sys.exit(f"ERROR: Document directory not found: {doc_dir}")
+
+    markdown = load_markdown(doc_dir)
+    classification = load_classification(doc_dir)
+
+    if args.list_sections:
+        list_sections(classification)
+        return
+
+    if not args.pages:
+        parser.error("Provide --pages <spec> or --list-sections")
+
+    pages = parse_pages(args.pages)
+    show_pages(pages, markdown, classification, show_md=not args.no_markdown)
+
+
+if __name__ == "__main__":
+    main()