Skip to content
github-actions[bot] edited this page May 15, 2026 · 2 revisions

Wiki — Technical Documentation

Back to Project README

Note: This folder is the canonical source. It is auto-synced to the GitHub Wiki tab by .github/workflows/sync-wiki.yml on every push to main. Edit pages here (via a PR), not in the Wiki UI — direct wiki edits will be overwritten on the next sync.

Contents

Architecture

Document Description
Architecture Overview Hexagonal architecture, layer breakdown, system diagram
Ports & Adapters 6 port contracts, Protocol definitions, adapter implementations
Pipeline Modes azure_di vs marker_docling vs datalab — comparison and switching
Azure DI Setup Cloud API and disconnected container setup
Deployment Environments Dev, staging, and on-prem production configs
Data Flow End-to-end pipeline from upload to compliance report

Backend — OCR Engines

Document Description
OCR Engine Strategy Engine capabilities, mode comparison, confidence formulas
Marker PDF-to-Markdown, LLM processors, cross-page table merging
Azure Document Intelligence Handwriting, barcodes, selection marks, per-word confidence
Docling Quality scoring: layout, table, OCR, parse (MIT, CPU-only)

Backend — Workflow

Document Description
Document Processing LangGraph state-graph, mode-based conditional routing
Document Segmentation & BPCR Detection Spec 011 pipeline: LLM segmentation + deterministic post-pipeline + BPCR sub-section detection + HITL overrides
Compliance Review ALCOA++, GMP, SOP verification agents
HITL Flow Interrupt/resume, review queue, audit trail
VLM Provider Vision-Language Model port + Gemini / vLLM adapters; visual checks; grayscale gate; absence-first prompts
Report Renderer Spec 008 PDF / HTML / Markdown export — five-column rule table, three-state taxonomy, WeasyPrint, versioned cache
Rule Authoring Playbook Operator guide for config-first rule updates and QA checks

Backend — Confidence & Config

Document Description
Composite Scorer Mode-specific confidence: DI word scores or Docling quality
Validation Rules Date, quantity, and content plausibility checks
Settings YAML structure, env overrides, priority order
Dependency Injection DI container, adapter wiring, match/case dispatch

Frontend

Document Description
Frontend Overview Next.js app structure, routing, Zustand state
Upload Flow Drag-and-drop, progress tracking, validation
Review Interface Split-pane HITL, inline editing, VLM findings, keyboard shortcuts
Compliance Dashboard ALCOA++ visualizations, severity breakdown, visual evidence viewer
BMR Runs UI /bmr/runs runs list + /bmr/runs/{id} detail with live stage progress, BPCR sections panel, findings summary
Corrections Manager OCR correction rules, confusion chart, rule management
WebSocket Streaming Real-time updates from LangGraph to browser

DevOps

Document Description
Local Setup Prerequisites, first-run walkthrough
GitHub Actions CI CI workflow, PR-quality gate, weekly maintenance, branch-protection rules, dependabot
Quick Commands Copy-paste reference for all dev commands

Suggested Reading Order

  1. Architecture Overview
  2. Ports & Adapters
  3. Pipeline Modes
  4. Data Flow
  5. OCR Engine Strategy
  6. Document Processing
  7. Document Segmentation & BPCR Detection
  8. Compliance Review
  9. VLM Provider
  10. Report Renderer
  11. BMR Runs UI
  12. Composite Scorer
  13. HITL Flow
  14. Settings
  15. Local Setup
  16. GitHub Actions CI

Clone this wiki locally