diff --git a/.claude/skills/audit-rule-author/SKILL.md b/.claude/skills/audit-rule-author/SKILL.md new file mode 100644 index 0000000..b10c988 --- /dev/null +++ b/.claude/skills/audit-rule-author/SKILL.md @@ -0,0 +1,484 @@ +--- +name: audit-rule-author +description: > + Author and maintain compliance audit rule configuration files for the Docs-digitization + pipeline. Use this skill whenever the user wants to: add a new rule to an existing audit + framework, update an existing rule's config or pass criteria, reduce false positives from + a rule, create an entirely new audit framework (new rule set), tune skip conditions or + severity, configure cannot_evaluate for rules needing external data, change a rule's + scope or applicable document/section types, or analyze compliance_result.json evaluation + results to diagnose and improve rules. Triggers on phrases like: "add a rule", + "update rule", "create a new audit", "write a compliance check", "too many false positives", + "rule is firing incorrectly", "add a new framework", "configure audit rules", + "author compliance rules", "new audit type", "analyze results", "look at findings", + "why is this rule failing", "what's non-compliant", "review evaluation results". +--- + +# Audit Rule Authoring + +You help author and maintain compliance audit rule configuration files for the +Docs-digitization pipeline. Rules live in `backend/app/compliance/rules/`. + +**Read `references/rule_authoring_playbook.md` now** — it is the authoritative guide for +all authoring conventions. Follow it exactly. This SKILL.md only covers the workflow +orchestration on top of it. + +If the rule you are authoring or fixing uses `evaluation_strategy: agentic_audit`, also +**read `references/agentic_audit_guide.md`** — it covers target vs context design, +pass_criteria patterns (signature images, multi-row tables, NOT_APPLICABLE thresholds), +and validation with the postpass script. + +--- + +## Document data location + +All processed document data lives under the storage root — `AT_STORAGE__BASE_PATH` +(the same var the pipeline/API use), or `backend/data/documents` when run from the +repo root — at `//`: + +``` +// +├── result.json ← OCR output: raw_markdown per page (keyed by string page number) +├── segmentation.json ← page-range sections (section_type, name, document_type, start/end page) +└── ... +``` + +(`peek_pages.py` accepts `--data-root` and honors `AT_STORAGE__BASE_PATH` if your +docs live elsewhere.) + +**Before writing any rule**, use the `peek_pages.py` script to read the actual page +content so your `pass_criteria` is grounded in real evidence: + +```bash +# List all sections and their page ranges for a document +python /scripts/peek_pages.py --doc --list-sections + +# View full OCR markdown for specific pages +python /scripts/peek_pages.py --doc --pages 33,34 + +# View section metadata only (no markdown dump) +python /scripts/peek_pages.py --doc --pages 33 --no-markdown + +# Page ranges also work +python /scripts/peek_pages.py --doc --pages 33-35 +``` + +The script path is `/scripts/peek_pages.py`. The skill base directory +is shown at the top of this file when the skill loads. + +When the user gives you a doc ID and page numbers, always `--list-sections` first to +confirm section types before authoring rules — it prevents scoping mistakes. + +--- + +## Determine the task + +Ask the user one question to orient yourself if it isn't already clear: + +> **"Are you (a) adding or updating rules in an existing audit framework, or (b) creating a brand-new audit framework?"** + +Then follow the appropriate path below. + +--- + +## Path A: Add or update rules in an existing framework + +### 1. Read context files + +Read all three before touching anything: + +``` +backend/app/compliance/rules/_rules.md +backend/app/compliance/rules/_rules.yaml +backend/app/compliance/rules/document_profiles.yaml ← canonical document/section types +``` + +The `document_profiles.yaml` is the source of truth for all valid `applicable_document_types` +and `applicable_section_types` values. Never invent type names — use only what is defined there. + +### 2. Author the rule(s) + +Follow the authoring conventions in `references/rule_authoring_playbook.md`: +- Rule text goes in `_rules.md` +- YAML config goes in `_rules.yaml` +- Use the minimum required fields plus any applicable optional fields + +### 3. Validate the rule against real pages + +Always peek the target pages first so `pass_criteria` is grounded in actual OCR content: + +```bash +python /scripts/peek_pages.py --doc --list-sections +python /scripts/peek_pages.py --doc --pages +``` + +Then choose the right validation tool based on `evaluation_strategy`. + +#### For `llm`, `vision`, or `text_and_vision` rules — use `validate_cli` + +Ask the user for a doc ID and page numbers (one or two that should PASS, one or two that +should FAIL) if you don't already have them. + +```bash +cd backend/ + +# positive samples — expect PASS +.venv/bin/python -m app.compliance.rules.validate_cli \ + --agent --rule \ + --doc --pages --expect pass + +# negative samples — expect FAIL +.venv/bin/python -m app.compliance.rules.validate_cli \ + --agent --rule \ + --doc --pages --expect fail +``` + +Add `--no-vlm` to skip vision during quick iteration. The CLI exits non-zero if any page +contradicts the `--expect` outcome. + +Diagnose failures from the `reasoning` field: +- Too strict → loosen language or add OCR-tolerance notes +- Too vague → tighten criteria or add concrete examples +- Wrong scope → adjust `applicable_section_types` or `applicable_page_types` +- Wrong skip → a page is being skipped that shouldn't be, or vice versa + +#### For `agentic_audit` rules — use the postpass script + +**Do not use `validate_cli` for `agentic_audit` rules** — it ignores `context_sources` +entirely and produces misleading results. + +Use the framework's agentic postpass script instead (always from `backend/`): + +```bash +cd backend/ +.venv/bin/python scripts/run__agentic_postpass.py --rule-number + +# Add --verbose-debug for full step-by-step agent traces on stderr: +.venv/bin/python scripts/run__agentic_postpass.py --rule-number --verbose-debug +``` + +The script outputs JSON with `status`, `confidence`, `reasoning`, and `evidence`. +Read the `reasoning` field first — it almost always identifies exactly what went wrong. + +For full authoring and debugging guidance for `agentic_audit` rules, read +**`references/agentic_audit_guide.md`** (loaded at the top of this session if the rule +uses this strategy). + +### 4. Config validator + +```bash +cd backend/ +.venv/bin/python - <<'PY' +from app.compliance.rules.registry import get_registry +from app.compliance.rules.profiles import validate_compliance_configs +validate_compliance_configs(get_registry()) +print("OK") +PY +``` + +Fix any errors before reporting done. + +--- + +## Path B: Create a new audit framework + +A new framework requires three files. Create them in this order: + +### 1. Register the framework + +Add an entry to `backend/app/compliance/rules/agents_meta.json`: + +```json +{ + "id": "", + "label": "", + "description": "" +} +``` + +`framework_id` must be lowercase, no spaces (e.g., `data_integrity`, `environmental`). + +### 2. Create the rule text file + +Create `backend/app/compliance/rules/_rules.md`. + +Structure it with category sections matching the framework's logical groupings. +Each rule is one numbered line: `. ` + +```markdown +Category: + +1. +2. + +-------------------------------------------------- + +Category: + +3. +``` + +### 3. Create the YAML config file + +Create `backend/app/compliance/rules/_rules.yaml`. + +Start from this skeleton — fill in what's known, leave optional fields at defaults: + +```yaml +defaults: + scope: page + severity: observation + evaluation_mode: llm + applicable_document_types: [] + excluded_document_types: [] + applicable_page_types: [] + applicable_section_types: [] + cross_section_requirements: [] + keywords: [] + pass_criteria: "" + skip_conditions: [] + cannot_evaluate_reason: "" + requires_external_data: [] + notes: "" + +categories: + : + severity: + applicable_page_types: [] + rules: + 1: + pass_criteria: > + + skip_conditions: + - " -> not_applicable" +``` + +Read `document_profiles.yaml` for valid document and section type values. + +### 4. Validate each rule against real pages + +Follow the same CLI-based validation process as Path A, Step 3. For a new framework, aim +for at least one positive and one negative sample per rule before committing the files. + +### 5. Config validator + +Run the validator (same command as Path A). Since the new agent class doesn't exist yet, +the validator may warn about the unregistered agent — that is expected. Note it to the user: +**agent class wiring is a separate engineering task** beyond this skill's scope. + +--- + +## Path C: Analyze compliance results and improve rules + +Use this path when the user wants to diagnose why rules are failing, firing incorrectly, +or producing uncertain results — by inspecting actual evaluation output. + +### 1. Filter evaluations + +Use `analyze_results.py` to slice `compliance_result.json` by any combination of +category, status, and page numbers: + +```bash +# List categories and status counts — always start here for an overview +python /scripts/analyze_results.py --doc --list + +# Filter by status + category +python /scripts/analyze_results.py --doc --status non_compliant --category attributable + +# Filter by specific pages +python /scripts/analyze_results.py --doc --pages 14,78,99 + +# Filter by specific rule +python /scripts/analyze_results.py --doc --rule ALC-ATT1 + +# Combine filters — e.g. all uncertain on pages 10-30 +python /scripts/analyze_results.py --doc --status uncertain --pages 10-30 + +# Summary table only, no detail +python /scripts/analyze_results.py --doc --status non_compliant --summary-only +``` + +The script prints a summary table, per-evaluation detail (reasoning + evidence), and a +`Failing pages per rule` JSON block at the end — use that JSON to drive validation in step 3. + +### 2. Analyze and propose changes + +Read the `reasoning` and `evidence` fields of the filtered evaluations and identify patterns: + +- **False positives** (rule fires but shouldn't): OCR artifact in reasoning? Skip condition + too narrow? `pass_criteria` missing an acceptable entry type? +- **False negatives** (rule should fire but doesn't): `pass_criteria` too lenient? Missing + a required check? +- **Uncertain** results: `pass_criteria` ambiguous? Conflicting text/vision verdicts? +- **not_applicable over-firing**: `skip_conditions` too broad? + +Cross-reference the current rule config in `_rules.yaml` before proposing. + +Present proposed changes clearly and **wait for user confirmation** before editing any file. + +### 3. Validate after changes + +After the user confirms and you've edited the YAML, re-run validation against the exact +pages that appeared in the filtered results. Use the `Failing pages per rule` JSON from +step 1 to know which pages to target. + +**For `llm`, `vision`, `text_and_vision`, `text_primary` rules:** + +```bash +cd backend/ +.venv/bin/python -m app.compliance.rules.validate_cli \ + --agent --rule \ + --doc --pages --expect pass +``` + +**For `agentic_audit` rules** — page numbers don't apply; run at doc level: + +```bash +cd backend/ +.venv/bin/python scripts/run__agentic_postpass.py --rule-number +``` + +If validation still fails, iterate on `pass_criteria` and re-run until the pages that +previously failed now pass. + +--- + +## Evaluation strategy selection + +### Step 1 — Does the rule need vision at all? + +Ask: *"Could OCR text alone reliably catch this violation?"* + +- **Yes, text is sufficient** → use `llm` (default). No `visual_checks` needed. + - Examples: keyword presence, date format checks, missing field labels, page numbering. +- **No, the violation is only detectable from the image** → use `vision`. Requires at least one `visual_checks` tag (see below). + - Examples: smudge/fading, sticky notes, correction fluid, ink color. +- **Both channels add value** → use `text_and_vision` or `text_primary` (see Step 2). + +### Step 2 — If both channels are used, can vision override text? + +| Strategy | When text says non_compliant but vision disagrees | When text says compliant but vision disagrees | Choose when… | +|---|---|---|---| +| `text_and_vision` | **Vision wins (de-escalates)** | **Text wins** | Vision rescues OCR false positives; vision cannot add new violations | +| `text_primary` | Text wins | Vision wins (escalates) | Vision is a catch-only safety net — text is authoritative, vision can only add violations | +| `llm_arbitrated` | A third LLM call resolves the conflict | — | Verdicts frequently conflict AND both channels are unreliable alone; expensive — use sparingly | + +**`text_and_vision`** — vision can ONLY de-escalate. When text is more severe than vision, vision wins and lowers +the verdict. When vision is more severe than text (vision non-compliant, text compliant), TEXT still wins — vision +cannot introduce a new violation. Use when OCR false positives are your main risk. + +**`text_primary`** — vision can ONLY escalate. When vision is more severe than text, vision wins and raises +the verdict. When text is more severe (text non-compliant, vision compliant), TEXT wins. Use when text is the +authoritative source AND you also want vision to catch visual-only violations text cannot see. + +**`llm_arbitrated`** — adds a third LLM call that sees both text and vision verdicts and resolves disagreements. +Only justify this when the two channels regularly produce conflicting verdicts on the same evidence AND you cannot +determine which should win by rule (e.g., OCR confidence is too variable to make `text_primary` reliable, but +the page also contains visual signals that text can't capture). It is the most expensive strategy; prefer +`text_and_vision` or `text_primary` first. + +### Step 3 — If using vision, choose the right `visual_checks` tag(s) + +**Vision evaluation only works through declared `visual_checks` tags.** The pipeline sends each tag's +domain-specific prompt to the VLM. Do not invent new tags — use only the supported list below. +If no existing tag fits, fall back to `pass_criteria` text in the VLM prompt (strategy `vision` with no +`visual_checks` will use a generic visual inspection prompt, which is less reliable). + +> **`pass_criteria` reaches both evaluators.** The text LLM and the VLM both receive `pass_criteria` as +> part of their prompt. Write `pass_criteria` to describe what valid evidence looks like for the text +> evaluator — OCR-aware guidance, valid entry formats, table layout notes. Do NOT add "Vision: …" +> subsections; vision behaviour is controlled exclusively by `visual_checks` tags. + +**Supported `visual_checks` tags:** + +| Tag | What it detects | +|-----|----------------| +| `VC-STRIKE` | Correction methodology — single-line strikethrough vs. scribble/overwrite; checks initials+date adjacent | +| `VC-SIGNATURE` | Signature field status — wet ink / typed / rubber stamp / empty; checks for accompanying date | +| `VC-INK-COLOR` | Ink color compliance — blue/black (pass), pencil or non-permanent (fail) | +| `VC-CORRECTION` | Prohibited corrections — white-out, erasure marks, overwriting, tape corrections | +| `VC-STAMP-SEAL` | Stamps/seals/watermarks — ORIGINAL, CONTROLLED COPY, COPY, QA approval stamps | +| `VC-ATTACHMENT` | Physical attachments — affixed labels/stickers, detached items, empty attachment spaces | +| `VC-BARCODE` | Barcode/label readability — clear vs. smudged/damaged/obscured | +| `VC-BLANK-FIELD` | Blank form fields — empty cells vs. filled / dash / N/A | +| `VC-DOC-QUALITY` | Physical page quality — smudges, fading, water damage, tears, pencil marks | +| `VC-CHART` | Chart/graph quality — axis labels, scale markings, units, data clarity | +| `VC-CHROMATOGRAM` | Chromatogram integrity — peak labels, baseline, integration marks, printout metadata | +| `VC-CHECKBOX` | Checkbox/tickmark status — checked / unchecked / N/A counts | +| `VC-PAGINATION` | Page numbering — "Page X of Y" in header/footer, legibility | +| `VC-STICKY-NOTE` | Temporary annotations — sticky notes, pencil annotations, draft watermarks | + +A rule can declare multiple tags; the VLM prompt is the union of all declared check descriptions. + +### Quick-reference: strategy × use case + +| Use case | Strategy | visual_checks | +|----------|----------|---------------| +| Text fields, dates, keywords | `llm` | none | +| Signatures, handwritten entries | `text_and_vision` | `VC-SIGNATURE` | +| Blank fields (OCR may miss empty cells) | `text_and_vision` | `VC-BLANK-FIELD` | +| Correction method (strikethrough vs. scribble) | `text_and_vision` | `VC-STRIKE` | +| Ink color | `vision` | `VC-INK-COLOR` | +| White-out / erasure / overwrite | `text_and_vision` | `VC-CORRECTION` | +| Page numbering (OCR splits "Page\n1 of 35") | `text_and_vision` | `VC-PAGINATION` | +| Document smudging / physical damage | `vision` | `VC-DOC-QUALITY` | +| Sticky notes / temporary annotations | `vision` | `VC-STICKY-NOTE` | +| Barcodes / labels | `vision` | `VC-BARCODE` | +| Charts / graphs | `vision` | `VC-CHART` | +| Frequently conflicting text/vision with no clear winner | `llm_arbitrated` | as needed | + +--- + +## Evaluation strategy merge behavior (reference) + +Merge logic is severity-based: non_compliant(4) > uncertain(3) > error(2) > compliant(1) > not_applicable(0). + +| Strategy | Tie (same severity) | Vision more severe | Text more severe | +|---|---|---|---| +| `text_and_vision` | **Text wins** | **Text wins** | Vision wins (de-escalates) | +| `text_primary` | **Text wins** | Vision wins (escalates) | **Text wins** | +| `vision` | Vision only | — | — | +| `llm_arbitrated` | Agreed verdict returned directly; LLM arbitrates conflicts | | | + +Key insight for `text_and_vision`: vision can **only** lower a verdict, never raise it. If text says compliant +and vision says non-compliant, text still wins. This makes it safe to pair with any `VC-*` tag — a noisy +vision check cannot introduce a false positive when text has already found the evidence of compliance. + +--- + +## Pre-flight checklist (both paths) + +- [ ] Rule text in `.md` matches YAML entries (same numbers, same count) +- [ ] Document and section types use canonical values from `document_profiles.yaml` +- [ ] `pass_criteria` is explicit and testable (see playbook for what "explicit" means) +- [ ] No "Vision: …" subsections in `pass_criteria` — vision is controlled by `visual_checks` tags only +- [ ] `skip_conditions` represent true non-applicability, not failure +- [ ] External dependencies use `cannot_evaluate` with reason + data list +- [ ] **Page type filter checked**: if the category sets `applicable_page_types: [form]`, verify the target + section's pages are actually classified as `form`. If not (e.g., `content`), override with + `applicable_page_types: []` at the rule level. Use `applicability_trace` in `compliance_result.json` + to diagnose — a `page_type: fail` trace entry is the symptom. +- [ ] Validation passed on at least one positive and one negative sample: + - `llm`/`vision`/`text_and_vision`/`text_primary` → `validate_cli` + - `agentic_audit` → postpass script (see `references/agentic_audit_guide.md`) +- [ ] Config validator passes (or expected warnings are explained) + +### Diagnosing "rule not firing" with applicability_trace + +When a rule returns `not_applicable` unexpectedly, read its `applicability_trace` in +`compliance_result.json` (under `agent_reports[].all_evaluations[]`). Each entry explains exactly +which filter caused the skip: + +- `section_type: fail` → rule's `applicable_section_types` doesn't match the page's section +- `page_type: fail` → **inherited** `applicable_page_types` from the category is blocking it; + override the rule with `applicable_page_types: []` +- `document_type: fail` → document type mismatch + +--- + +## References + +| File | When to read | +|------|-------------| +| `references/rule_authoring_playbook.md` | Every session — authoring conventions, field semantics, anti-patterns | +| `references/agentic_audit_guide.md` | When authoring or debugging any `agentic_audit` rule — target/context design, pass_criteria patterns, postpass validation | diff --git a/.claude/skills/audit-rule-author/evals/evals.json b/.claude/skills/audit-rule-author/evals/evals.json new file mode 100644 index 0000000..8317dbe --- /dev/null +++ b/.claude/skills/audit-rule-author/evals/evals.json @@ -0,0 +1,26 @@ +{ + "skill_name": "audit-rule-author", + "evals": [ + { + "id": 0, + "name": "update-existing-rule", + "prompt": "Rule 27 in the ALCOA framework keeps flagging pages as non-compliant because OCR is misreading handwritten times — things like '14:30' being read as '14;30' or '1430'. Can you update the pass criteria to handle these OCR artifacts properly?", + "expected_output": "Updated pass_criteria for rule 27 in alcoa_rules.yaml that explicitly handles OCR time-reading artifacts, validator passes.", + "files": [] + }, + { + "id": 1, + "name": "add-to-gmp", + "prompt": "I need to add a rule to the GMP framework: equipment calibration due dates must be recorded alongside each equipment usage entry in manufacturing operations. If the calibration is expired or not recorded, that's a major finding.", + "expected_output": "New rule added to gmp_rules.md with rule text and gmp_rules.yaml with proper YAML config (applicable_section_types, pass_criteria, skip_conditions, severity), validator passes.", + "files": [] + }, + { + "id": 2, + "name": "new-framework", + "prompt": "We need a brand-new audit framework called 'lab_notebook' for laboratory notebooks. It should cover three things: (1) raw data entries are not altered or overwritten, (2) each entry has a date and the researcher's signature, (3) any corrections use proper strikethrough with initials and date. Set severity to major for all rules.", + "expected_output": "agents_meta.json updated with lab_notebook entry, lab_notebook_rules.md created with 3 rules, lab_notebook_rules.yaml created with proper config for all 3 rules.", + "files": [] + } + ] +} diff --git a/.claude/skills/audit-rule-author/references/agentic_audit_guide.md b/.claude/skills/audit-rule-author/references/agentic_audit_guide.md new file mode 100644 index 0000000..e93d673 --- /dev/null +++ b/.claude/skills/audit-rule-author/references/agentic_audit_guide.md @@ -0,0 +1,257 @@ +# Agentic Audit Rules — Authoring Guide + +Rules with `evaluation_strategy: agentic_audit` run a multi-step LLM agent rather than a +single-pass evaluator. The agent receives: (a) the primary section's raw OCR pages and +(b) pre-loaded raw OCR pages from `context_sources`. It reasons across both before +producing a verdict. + +Read this guide whenever you are **authoring or debugging** an `agentic_audit` rule. + +--- + +## When to choose agentic_audit vs llm + +| Situation | Strategy | +|-----------|----------| +| Single-page check against the page itself | `llm` | +| Cross-reference between two different sections | `agentic_audit` | +| Each row in a table must be verified against another section | `agentic_audit` | +| Simple presence/format/signature check | `llm` | + +`agentic_audit` is slower and more expensive. Use it only when the compliance check +genuinely requires cross-section evidence that cannot be encoded in a single `pass_criteria`. + +--- + +## Designing target vs context + +The agent evaluates the **target section** (set by `applicable_section_types`) and uses +the **context section** (set by `context_sources`) as reference material. + +**Key principle: the more structured, enumerable section should be the target.** + +The agent works best when it can iterate row-by-row over the target. The context can be +large and unstructured — it is pre-loaded as raw OCR pages and the agent reads what it needs. + +**Example — weighing sheet vs manufacturing ops:** + +| Role | Section | Why | +|------|---------|-----| +| Target | `material_dispensing` (2 pages, structured table) | One row per material/step — easy to enumerate | +| Context | `manufacturing_operations` (26 pages, step-by-step) | Agent looks up steps on demand | + +The agent checks each dispensing row then looks up its step number in the context. +This is far easier than scanning 26 manufacturing pages for charging steps and +cross-referencing backwards to the weighing sheet. + +**Anti-pattern:** Don't make a large, unstructured section the target when a +smaller structured counterpart exists. The agent will miss items. + +--- + +## YAML configuration + +```yaml +rules: + 1: + evaluation_strategy: agentic_audit + applicable_document_types: [batch_record] + applicable_section_types: [material_dispensing] # target — evaluated directly + context_sources: + - document_type: batch_record + section_types: [manufacturing_operations] # context — loaded as raw pages +``` + +`context_sources` is a list. Each entry has: +- `document_type`: the package document type to pull from +- `section_types`: list of section types to include; `[]` means all sections of that doc type + +The pipeline pre-loads all matching pages as raw OCR markdown **before** the agent starts. +The agent sees the full content without calling any retrieval tools. + +--- + +## Writing pass_criteria for agentic_audit + +### Recommended template structure + +``` +TARGET: [Describe the target section's layout — columns, row structure, what to look at. + Include the signature image note if signatures appear in the target.] + +CONTEXT: [Describe what the context section contains and how to use it. + Include the signature image note if signatures appear in the context.] + +[ROW STRUCTURE NOTE — only if the target has summary-header + sub-row pattern; see below] + +WHAT TO CHECK: [Step-by-step evaluation instructions. Enumerate items one by one.] + +SKIP: [Conditions where an item is skipped rather than failed.] + +VERDICT: +- COMPLIANT if ... +- NON-COMPLIANT if ... (cite specific item, row, or step number) +- NOT_APPLICABLE only if [threshold — usually "no evaluable items exist"] +``` + +--- + +### Pattern 1 — Signature images in OCR + +Both target and context pages represent handwritten signatures as `` tags. +The agent will treat them as empty cells unless you tell it otherwise. + +**Always include this note in `pass_criteria` wherever signatures appear:** + +> Done by / Checked by cells often contain embedded signature images (shown as ``) +> rather than plain text — treat any non-empty cell (image, text, or date alone) as +> "executed/signed". Only blank cells or cells containing solely dashes (`—`, `-`) should +> be treated as unsigned/not executed. + +This applies to both target pages and context pages. If signatures appear in the +context (e.g., manufacturing ops), add the note in the CONTEXT block of your criteria. + +--- + +### Pattern 2 — NOT_APPLICABLE threshold + +A common mistake is letting the agent mark a whole page NOT_APPLICABLE because it found +**some** excluded items, even though other evaluable items exist on the same page. + +Always be explicit: + +> NOT_APPLICABLE only if EVERY item on this page qualifies as a skip condition. +> Even a single evaluable item means the page must return COMPLIANT or NON_COMPLIANT. + +--- + +### Pattern 3 — Summary header vs step sub-rows + +Some pharmaceutical tables use a two-level row structure: +- **Summary header row**: all step numbers aggregated, total quantity, final sign-off +- **Individual step sub-rows** (indented below): one row per step, step-specific qty and sign-off + +When evaluating step-by-step, the **sub-rows are authoritative**. The summary header is a +roll-up sign-off, not a per-step record. If the summary header and a sub-row conflict +(different step numbers, different "Done by" status), trust the sub-row. + +Add this to `pass_criteria` when the target section has this structure: + +> ROW STRUCTURE: Each material has a summary header row (all steps aggregated, total qty, +> final sign-off) and individual step sub-rows (one per step, step-specific qty and sign-off). +> When sub-rows exist, use the sub-row's Done by / Checked by to determine whether material +> was dispensed for THAT step — do not use the summary header's signature to infer +> per-step dispensing status. +> SKIP sub-rows where Done by AND Checked by are both blank/dashes — those represent +> conditional steps not executed (e.g., a re-wash step skipped because the pH test passed). +> If a material has NO sub-rows, use the header row's Done by / Checked by. + +--- + +### Pattern 4 — Exclusions need counter-examples + +When a rule excludes certain step types (e.g., "skip transfer steps"), the agent will +often over-apply the exclusion to adjacent steps. Ground the exclusion with a concrete +example AND a counter-example: + +> EXCLUDED: Transfer of a pre-prepared intermediate solution between vessels, e.g. +> "Add sodium hydroxide solution from addition tank ATE013" — the original raw materials +> were already charged at an earlier step. +> IMPORTANT: This exclusion applies ONLY when the material being added is a named solution +> whose constituents were charged and weighed at an earlier numbered step in the same batch +> record. It does NOT apply to charging virgin raw materials directly from storage. +> Counter-example: "dissolving 35.39 Kg of Caustic Soda Flakes in 354 L of Purified Water +> in reactor SRE024" IS a charging step, even though the NaOH solution is later transferred. + +--- + +## Validation + +**Do not use `validate_cli` for `agentic_audit` rules.** The validate_cli runs a single-pass +LLM evaluator and ignores `context_sources` entirely — it will produce misleading results. + +Use the agentic postpass script instead: + +```bash +cd backend/ + +# Checklist framework: +.venv/bin/python scripts/run_checklist_agentic_postpass.py --rule-number + +# Add --verbose-debug to print full JSON snapshots of each agent step to stderr: +.venv/bin/python scripts/run_checklist_agentic_postpass.py --rule-number --verbose-debug +``` + +If your framework does not have a postpass script yet, check `backend/scripts/` — the +script is framework-specific because it imports the agent's `AGENT_NAME` constant. +Creating a postpass script for a new framework is straightforward: copy +`run_checklist_agentic_postpass.py` and replace the `AGENT_NAME` import. + +The script outputs JSON. Key fields: +- `status`: `compliant` / `non_compliant` / `not_applicable` / `uncertain` +- `confidence`: 0.0–1.0 +- `reasoning`: the agent's step-by-step logic — **read this first when debugging** +- `evidence`: specific data cited + +--- + +## Debugging a wrong verdict + +### Step 1 — Peek both sections first + +Before editing `pass_criteria`, confirm the data is actually in the OCR: + +```bash +# Target section +python /scripts/peek_pages.py --doc --pages + +# Context section +python /scripts/peek_pages.py --doc --pages +``` + +If the data isn't in the OCR, the rule cannot fix a bad verdict — it's a classification +or OCR issue. + +### Step 2 — Read the reasoning field + +The `reasoning` field in the JSON output almost always tells you exactly what went wrong. +Common patterns and their fixes: + +| Reasoning says | Root cause | Fix | +|----------------|------------|-----| +| "lack of explicit details" / "insufficient evidence" | Agent can't find the data in context | Add clearer pointers in the CONTEXT block; add concrete examples | +| "step X not executed" but OCR shows a signature | Signature image guidance missing | Add `` = signed note to CONTEXT block | +| Page marked NOT_APPLICABLE despite evaluable items | NOT_APPLICABLE threshold too broad | Add "only if EVERY item is excluded" language | +| Wrong material or step number cited | Summary header confused with sub-rows | Add row structure explanation | +| Exclusion applied to a step that should be checked | Exclusion boundary not grounded | Add counter-example showing what is NOT excluded | + +### Step 3 — Run --verbose-debug + +```bash +.venv/bin/python scripts/run_checklist_agentic_postpass.py --rule-number --verbose-debug 2>&1 +``` + +This prints each agent action (context preload, tool calls, verdict) to stderr as NDJSON. +Look for the `preload_context` event to confirm how many characters of context loaded, +and the `verdict` event to see the raw reasoning before synthesis. + +### Step 4 — Iterate + +Edit `pass_criteria`, re-run the postpass script, check again. There is some LLM variance; +if a fix seems logically correct but the result is still wrong, run 2–3 times before +concluding it didn't work. + +--- + +## Pre-flight checklist for agentic_audit rules + +In addition to the standard pre-flight in SKILL.md: + +- [ ] Target section is the more structured/smaller section +- [ ] Context section is correctly listed in `context_sources` with the right `section_types` +- [ ] `pass_criteria` describes both TARGET and CONTEXT blocks +- [ ] Signature image note (`` = signed) present for both target and context if needed +- [ ] NOT_APPLICABLE threshold says "only if EVERY item is excluded" +- [ ] Multi-row table structure (summary header vs sub-rows) explained if applicable +- [ ] Exclusions have at least one concrete counter-example +- [ ] Validated with the postpass script (not validate_cli) — positive and negative samples diff --git a/.claude/skills/audit-rule-author/references/rule_authoring_playbook.md b/.claude/skills/audit-rule-author/references/rule_authoring_playbook.md new file mode 100644 index 0000000..b4bb846 --- /dev/null +++ b/.claude/skills/audit-rule-author/references/rule_authoring_playbook.md @@ -0,0 +1,192 @@ +# Compliance Rule Authoring Playbook + +This guide is for domain owners and QA reviewers writing or updating compliance rules without changing backend code. + +## Goals + +- Keep rules deterministic and explainable. +- Minimize false positives from broad applicability. +- Make rule intent maintainable by non-authors. +- Preserve stable scoring across runs. + +## Where to edit + +- Rule text: `backend/app/compliance/rules/*_rules.md` +- Rule behavior/config: `backend/app/compliance/rules/*_rules.yaml` +- Document profile + section taxonomy: `backend/app/compliance/rules/document_profiles.yaml` + +Rule behavior must live in YAML, not in comments. + +## Authoring model + +Each rule has two parts: + +1. Human-readable statement (`.md`) +2. Machine-executable metadata (`.yaml`) + +Minimum YAML fields to set for every new or changed rule: + +- `applicable_document_types` +- `applicable_section_types` +- `applicable_page_types` (if needed) +- `pass_criteria` +- `skip_conditions` + +Optional advanced fields: + +- `evaluation_strategy`: controls how the rule is evaluated. Values: + - `llm` (default): single-pass LLM against the page OCR text + - `vision`: VLM image pass only (requires `visual_checks`) + - `text_and_vision`: both OCR text (LLM) and image (VLM). Vision can **only de-escalate** — if text + says non_compliant and vision says compliant, vision wins; if text says compliant and vision says + non_compliant, text still wins. Use when OCR false positives are the main risk. + - `text_primary`: both channels, but vision can **only escalate** — text is authoritative unless + vision finds a worse violation. Use when you want vision as an additional catch-only safety net. + - `agentic_audit`: multi-step agent with cross-section context — see `agentic_audit_guide.md` +- `context_sources`: list of `{document_type, section_types}` — only used with `agentic_audit`. + Specifies which sections are pre-loaded as raw OCR pages for the agent to cross-reference. +- `evaluation_mode: cannot_evaluate` +- `cannot_evaluate_reason` +- `requires_external_data` +- `cross_section_requirements` +- `keywords` (only if helpful; avoid over-restricting) + +## Section and document scoping + +Use canonical values defined in `document_profiles.yaml`. + +Examples: + +- Document types: `batch_record`, `sop`, `logbook`, `certificate` +- Section types: `manufacturing_operations`, `material_dispensing`, `qc_report`, `line_clearance` + +If a section name appears differently in documents, add alias mapping in `document_profiles.yaml` instead of inventing a new section type in a rule. + +## Writing good `pass_criteria` + +Use explicit, testable language: + +- Good: "Any non-empty text in `Done by`/`Checked by` columns counts as signed." +- Bad: "Looks properly signed." + +Include OCR-aware instructions where relevant: + +- Garbled handwritten text may still be valid signature evidence. +- Dash values (`-`, `---`, `—`) may mean not applicable. +- OCR year artifacts should not be treated as hard data-integrity failures without context. + +## Writing good `skip_conditions` + +Skip should represent true non-applicability, not failure. + +- Good: "Page has no checklist items -> not_applicable" +- Bad: "No checklist found -> non_compliant" + +Prefer concrete conditions tied to page structure/content. + +## When to use `cannot_evaluate` + +Use this for rules that require data outside the packet: + +- Training records +- Signature logs +- Calibration systems +- Archive/IT systems + +Set all of: + +- `evaluation_mode: cannot_evaluate` +- `cannot_evaluate_reason` +- `requires_external_data` + +## Cross-section rules + +For rules comparing two sections, set: + +- `scope: document` or `scope: section` +- `cross_section_requirements` (from deterministic resolver) + +Current requirement IDs include: + +- `operation_vs_weighing_reconciliation` +- `material_usage_vs_dispensing` +- `sample_sent_vs_qc_report` +- `qc_vs_coa_consistency` +- `inter_section_consistency` + +## Anti-patterns to avoid + +- Broad rules with no document/section scope. +- Over-reliance on generic keywords as primary applicability logic. +- Encoding executable logic in comments only. +- Treating OCR artifacts as compliance failures by default. +- Mixing pass/fail criteria with business process assumptions not present in evidence. +- Writing "Vision: …" guidance in `pass_criteria` — vision behaviour is controlled by `visual_checks` + tags, not prose in `pass_criteria`. Keep `pass_criteria` focused on what the text LLM should look for. +- Forgetting that category-level `applicable_page_types` is inherited by every rule in that category. + If the target section's pages are classified as `content` (not `form`), they will be silently skipped. + Override with `applicable_page_types: []` at the rule level and verify with `applicability_trace`. + +## Change checklist + +Before marking a rule update complete: + +1. Rule text and YAML are both updated. +2. Document and section scopes are set. +3. `pass_criteria` and `skip_conditions` are explicit. +4. Any external dependency is marked `cannot_evaluate`. +5. Config validator passes. + +Validator command: + +```bash +backend/.venv/bin/python - <<'PY' +from app.compliance.rules.registry import get_registry +from app.compliance.rules.profiles import validate_compliance_configs +validate_compliance_configs(get_registry()) +print("OK") +PY +``` + +## Quick templates + +### Standard page rule + +```yaml +12: + applicable_document_types: [batch_record] + applicable_section_types: [manufacturing_operations] + applicable_page_types: [form] + pass_criteria: > + + skip_conditions: + - "Page has no -> not_applicable" +``` + +### External dependency rule + +```yaml +21: + evaluation_mode: cannot_evaluate + cannot_evaluate_reason: "Requires calibration system records" + requires_external_data: [calibration_records] +``` + +### Cross-section rule + +```yaml +7: + scope: document + cross_section_requirements: + - sample_sent_vs_qc_report + pass_criteria: > + +``` + +## Ownership recommendation + +- Domain owner: rule text + acceptance semantics. +- QA/compliance lead: severity + process correctness. +- Engineering owner: schema compliance + deterministic fit. + +This keeps SoC clean while preserving fast iteration. diff --git a/.claude/skills/audit-rule-author/scripts/analyze_results.py b/.claude/skills/audit-rule-author/scripts/analyze_results.py new file mode 100644 index 0000000..4d9f614 --- /dev/null +++ b/.claude/skills/audit-rule-author/scripts/analyze_results.py @@ -0,0 +1,203 @@ +#!/usr/bin/env python3 +""" +analyze_results.py — Filter and inspect compliance evaluation results. + +Usage (run from project root or backend/): + # Show all non_compliant in a category + python /scripts/analyze_results.py --doc --status non_compliant --category attributable + + # Show all evaluations for specific pages + python /scripts/analyze_results.py --doc --pages 14,78 + + # Show all uncertain evaluations across all categories + python /scripts/analyze_results.py --doc --status uncertain + + # Filter by rule ID + python /scripts/analyze_results.py --doc --rule ALC-ATT1 + + # Combine filters + python /scripts/analyze_results.py --doc --status non_compliant --pages 14,30,78 + + # Show summary only (no per-evaluation detail) + python /scripts/analyze_results.py --doc --status non_compliant --summary-only + + # List available categories and status counts + python /scripts/analyze_results.py --doc --list +""" + +import argparse +import json +import os +import sys +from pathlib import Path + + +def find_data_root() -> Path: + """Documents storage root: AT_STORAGE__BASE_PATH (the pipeline's own var), + else backend/data/documents / data/documents relative to cwd.""" + candidates = [] + env = os.environ.get("AT_STORAGE__BASE_PATH") + if env: + candidates.append(Path(env)) + candidates += [Path("backend/data/documents"), Path("data/documents")] + for candidate in candidates: + if candidate.is_dir(): + return candidate + sys.exit( + "ERROR: Cannot find the documents storage root. Set AT_STORAGE__BASE_PATH " + "or run from the repo root / backend/." + ) + + +def load_results(doc_id: str) -> dict: + root = find_data_root() + path = root / doc_id / "compliance_result.json" + if not path.exists(): + sys.exit(f"ERROR: compliance_result.json not found at {path}") + with open(path) as f: + return json.load(f) + + +def gather_evaluations(data: dict) -> list[dict]: + """Collect all_evaluations from every agent report, annotating with agent name.""" + evals = [] + for agent_report in data.get("agent_reports", []): + agent = agent_report.get("agent", "unknown") + for ev in agent_report.get("all_evaluations", []): + ev = dict(ev) + ev.setdefault("agent", agent) + evals.append(ev) + return evals + + +def parse_pages(pages_str: str) -> set[int]: + pages = set() + for part in pages_str.split(","): + part = part.strip() + if "-" in part: + start, end = part.split("-", 1) + pages.update(range(int(start), int(end) + 1)) + else: + pages.add(int(part)) + return pages + + +def format_evaluation(ev: dict, idx: int) -> str: + lines = [ + f"\n{'='*60}", + f"[{idx}] {ev.get('rule_id', '?')} — {ev.get('rule_text', '')}", + f" Agent : {ev.get('agent', '?')}", + f" Category : {ev.get('rule_category', '?')}", + f" Status : {ev.get('status', '?')} (confidence: {ev.get('confidence', '?')})", + f" Pages : {ev.get('page_numbers', [])}", + ] + if ev.get("reasoning"): + lines.append(f" Reasoning: {ev['reasoning'][:300]}{'...' if len(ev.get('reasoning','')) > 300 else ''}") + if ev.get("evidence"): + lines.append(f" Evidence : {ev['evidence'][:200]}{'...' if len(ev.get('evidence','')) > 200 else ''}") + if ev.get("applicability_trace"): + lines.append(f" Trace : {ev['applicability_trace']}") + return "\n".join(lines) + + +def main(): + parser = argparse.ArgumentParser(description="Filter compliance evaluation results") + parser.add_argument("--doc", required=True, help="Document ID") + parser.add_argument("--status", help="Filter by status: compliant|non_compliant|not_applicable|uncertain|error") + parser.add_argument("--category", help="Filter by rule category slug (e.g. attributable, legible)") + parser.add_argument("--pages", help="Filter by page numbers (e.g. 14,78 or 10-20)") + parser.add_argument("--rule", help="Filter by rule ID (e.g. ALC-ATT1)") + parser.add_argument("--agent", help="Filter by agent name (e.g. alcoa, checklist)") + parser.add_argument("--summary-only", action="store_true", help="Show counts only, no detail") + parser.add_argument("--list", action="store_true", help="List categories and status counts, then exit") + args = parser.parse_args() + + data = load_results(args.doc) + all_evals = gather_evaluations(data) + + print(f"\nDocument : {data.get('filename', args.doc)}") + print(f"Doc ID : {args.doc}") + print(f"Pages : {data.get('total_pages', '?')}") + print(f"Agents : {[r['agent'] for r in data.get('agent_reports', [])]}") + print(f"Total evaluations loaded: {len(all_evals)}") + + if args.list: + from collections import Counter + cat_counts: dict[str, Counter] = {} + for ev in all_evals: + cat = ev.get("rule_category", "unknown") + status = ev.get("status", "unknown") + if cat not in cat_counts: + cat_counts[cat] = Counter() + cat_counts[cat][status] += 1 + print("\n--- Categories and status counts ---") + for cat in sorted(cat_counts): + counts = cat_counts[cat] + total = sum(counts.values()) + print(f"\n {cat} ({total} rules)") + for status in ["non_compliant", "uncertain", "compliant", "not_applicable", "error"]: + if counts[status]: + print(f" {status}: {counts[status]}") + return + + # Apply filters + filtered = all_evals + + if args.agent: + filtered = [e for e in filtered if e.get("agent") == args.agent] + + if args.status: + filtered = [e for e in filtered if e.get("status") == args.status] + + if args.category: + cat_lower = args.category.lower() + filtered = [e for e in filtered if (e.get("rule_category") or "").lower() == cat_lower] + + if args.rule: + filtered = [e for e in filtered if e.get("rule_id") == args.rule] + + if args.pages: + page_set = parse_pages(args.pages) + filtered = [e for e in filtered if any(p in page_set for p in (e.get("page_numbers") or []))] + + print(f"\nFiltered : {len(filtered)} evaluation(s)") + + if not filtered: + print("No evaluations match the given filters.") + return + + # Summary table + print("\n--- Summary ---") + print(f"{'Rule ID':<20} {'Category':<18} {'Status':<16} {'Conf':<6} {'Pages'}") + print("-" * 80) + for ev in filtered: + pages_str = str(ev.get("page_numbers", [])) + conf = ev.get("confidence", 0) + print(f"{ev.get('rule_id','?'):<20} {ev.get('rule_category','?'):<18} {ev.get('status','?'):<16} {conf:<6.2f} {pages_str}") + + if args.summary_only: + return + + # Detailed output + print("\n--- Detail ---") + for i, ev in enumerate(filtered, 1): + print(format_evaluation(ev, i)) + + # Emit machine-readable footer for skill use + failing_pages: dict[str, list[int]] = {} + for ev in filtered: + rule_id = ev.get("rule_id", "") + pages = ev.get("page_numbers") or [] + if pages: + existing = failing_pages.get(rule_id, []) + for p in pages: + if p not in existing: + existing.append(p) + failing_pages[rule_id] = existing + + print("\n--- Failing pages per rule (for validation) ---") + print(json.dumps(failing_pages, indent=2)) + + +if __name__ == "__main__": + main() diff --git a/.claude/skills/audit-rule-author/scripts/peek_pages.py b/.claude/skills/audit-rule-author/scripts/peek_pages.py new file mode 100644 index 0000000..8434b89 --- /dev/null +++ b/.claude/skills/audit-rule-author/scripts/peek_pages.py @@ -0,0 +1,209 @@ +#!/usr/bin/env python3 +""" +peek_pages.py — Inspect OCR markdown content and section metadata for document pages. + +Usage (run from project root or backend/): + python /scripts/peek_pages.py --doc --pages 33,34 + python /scripts/peek_pages.py --doc --pages 33-35 + python /scripts/peek_pages.py --doc --pages 33 --no-markdown + python /scripts/peek_pages.py --doc --list-sections + +Data is read from the document storage root (``AT_STORAGE__BASE_PATH``, or +``backend/data/documents`` when run from the repo root)//: + - result.json → raw OCR markdown per page + - segmentation.json → page-range sections (section_type, name, document_type) +""" + +import argparse +import json +import os +import sys +from pathlib import Path + + +def find_data_root(explicit: str | None = None) -> Path: + """Locate the document storage root that contains the folders. + + Resolution order: explicit ``--data-root`` → ``AT_STORAGE__BASE_PATH`` (the + same var the pipeline/API use) → ``backend/data/documents`` / ``data/documents`` + relative to cwd. + """ + candidates = [] + if explicit: + candidates.append(Path(explicit)) + env = os.environ.get("AT_STORAGE__BASE_PATH") + if env: + candidates.append(Path(env)) + candidates += [Path("backend/data/documents"), Path("data/documents")] + for candidate in candidates: + if candidate.is_dir(): + return candidate + sys.exit( + "ERROR: Cannot find the documents storage root. Pass --data-root, set " + "AT_STORAGE__BASE_PATH, or run from the repo root / backend/." + ) + + +def parse_pages(spec: str) -> list[int]: + """Parse '33', '33,34', or '33-35' into a sorted list of ints.""" + pages = [] + for part in spec.split(","): + part = part.strip() + if "-" in part: + lo, hi = part.split("-", 1) + pages.extend(range(int(lo), int(hi) + 1)) + else: + pages.append(int(part)) + return sorted(set(pages)) + + +def load_markdown(doc_dir: Path) -> dict[str, str]: + result_path = doc_dir / "result.json" + if not result_path.exists(): + sys.exit(f"ERROR: result.json not found at {result_path}") + with result_path.open() as f: + data = json.load(f) + raw = data.get("raw_markdown", {}) + if not isinstance(raw, dict): + sys.exit("ERROR: raw_markdown in result.json is not a dict — unexpected format") + return raw # keys are string page numbers + + +def load_classification(doc_dir: Path) -> dict[int, dict]: + """Return per-page section metadata keyed by page number. + + Primary source on main is ``segmentation.json`` (a ``DocumentSegmentation`` + with page-range ``sections``); each section's ``start_page..end_page`` is + expanded to per-page rows. Falls back to a legacy per-page + ``classification.yaml`` if present (older pipeline output). + """ + seg_path = doc_dir / "segmentation.json" + if seg_path.exists(): + with seg_path.open() as f: + data = json.load(f) + out: dict[int, dict] = {} + for sec in data.get("sections", []): + start, end = sec.get("start_page", 0), sec.get("end_page", 0) + if not start or not end: + continue + for pn in range(start, end + 1): + out[pn] = { + "section_type_id": sec.get("section_type", "unknown"), + "section_name": sec.get("name", ""), + "page_role": sec.get("document_type", "—"), + "detection_notes": sec.get("description", ""), + } + return out + + # Legacy fallback: per-page classification.yaml (older pipeline output) + cls_path = doc_dir / "classification.yaml" + if not cls_path.exists(): + return {} + try: + import yaml + except ImportError: + return {} + with cls_path.open() as f: + data = yaml.safe_load(f) + pages = data.get("pages", []) + return {p["page_number"]: p for p in pages if "page_number" in p} + + +def print_separator(label: str = "") -> None: + line = "─" * 70 + if label: + print(f"\n{line}") + print(f" {label}") + print(line) + else: + print(line) + + +def list_sections(classification: dict[int, dict]) -> None: + if not classification: + print("No classification.yaml found.") + return + # Group consecutive pages by section + current_section = None + section_start = None + rows = [] + for pn in sorted(classification): + meta = classification[pn] + sec = meta.get("section_type_id", "unknown") + if sec != current_section: + if current_section is not None: + rows.append((section_start, pn - 1, current_section, + classification[section_start].get("section_name", ""))) + current_section = sec + section_start = pn + if current_section is not None: + last = max(classification) + rows.append((section_start, last, current_section, + classification[section_start].get("section_name", ""))) + + print(f"\n{'PAGE RANGE':<16} {'SECTION TYPE':<35} {'DISPLAY NAME'}") + print("─" * 80) + for start, end, sec_type, display in rows: + rng = f"{start}" if start == end else f"{start}–{end}" + print(f" {rng:<14} {sec_type:<35} {display}") + + +def show_pages(pages: list[int], markdown: dict, classification: dict, show_md: bool) -> None: + for pn in pages: + key = str(pn) + md = markdown.get(key) + meta = classification.get(pn, {}) + + print_separator(f"PAGE {pn}") + + # Section metadata + if meta: + print(f" Section type : {meta.get('section_type_id', '—')}") + print(f" Section name : {meta.get('section_name', '—')}") + print(f" Page role : {meta.get('page_role', '—')}") + notes = meta.get("detection_notes", "") + if notes: + print(f" Notes : {notes}") + else: + print(" (No classification metadata for this page)") + + if show_md: + print() + if md: + print(md) + else: + print(f" [No markdown content found for page {pn}]") + + print_separator() + + +def main() -> None: + parser = argparse.ArgumentParser(description="Inspect OCR markdown content for document pages") + parser.add_argument("--doc", required=True, help="Document ID (folder name under backend/data/documents/)") + parser.add_argument("--pages", help="Pages to inspect: '33', '33,34', or '33-35'") + parser.add_argument("--list-sections", action="store_true", help="List all sections with page ranges") + parser.add_argument("--no-markdown", action="store_true", help="Show section metadata only, skip markdown content") + parser.add_argument("--data-root", help="Documents storage root (defaults to AT_STORAGE__BASE_PATH or backend/data/documents)") + args = parser.parse_args() + + data_root = find_data_root(args.data_root) + doc_dir = data_root / args.doc + if not doc_dir.is_dir(): + sys.exit(f"ERROR: Document directory not found: {doc_dir}") + + markdown = load_markdown(doc_dir) + classification = load_classification(doc_dir) + + if args.list_sections: + list_sections(classification) + return + + if not args.pages: + parser.error("Provide --pages or --list-sections") + + pages = parse_pages(args.pages) + show_pages(pages, markdown, classification, show_md=not args.no_markdown) + + +if __name__ == "__main__": + main()