This document provides high-level context, architectural summaries, and development rules for AI agents working on the Snappy codebase.
Snappy is a vision-grounded document retrieval system (RAG) that processes PDFs as images.
- Core Idea: Embed page images directly using ColPali (multivector) instead of relying solely on OCR text.
- Hybrid Mode: Optionally uses DeepSeek OCR for text extraction and bounding boxes, combining vision and text search.
- Stack: FastAPI (Backend), Next.js 16 (Frontend), Qdrant (Vector DB), Local Storage.
backend/- FastAPI application (Python 3.11+).api/- Routers and app entry point.clients/- Integrations (Qdrant, Local Storage, ColPali, OCR).domain/- Core logic (Pipeline, Batch Processing).config/- Schema-driven configuration.
frontend/- Next.js application (TypeScript, React 19).app/- App router pages and API routes.lib/api/- Generated OpenAPI client.components/- UI components (shadcn/ui based).
colpali/- Embedding service (Python/FastAPI).deepseek-ocr/- OCR service (Python/FastAPI).
- Manager:
uv(preferred) orpip. - Run:
uvicorn backend.main:app --reload(Port 8000). - Test:
pytest tests/. - Lint:
pre-commit run --all-files.
- Manager:
yarn. - Run:
yarn dev(Port 3000). - Gen SDK:
yarn gen:sdk(after backend API changes).
- Full Stack:
docker compose up -d. - Services:
colpali,deepseek-ocr,qdrant.
- Robustness: Implement changes robustly. Avoid temporary shims unless requested.
- Cleanup: Remove unused files/code after refactoring.
- Communication: Ask clarifying questions if instructions are ambiguous.
- Style: PEP 8, Black formatting, isort.
- Types: Full type hinting required (enforced by Pyright).
- Async: Use
async/awaitfor I/O. - API: Update
backend/config_schema.pyfor new settings. Rerunscripts/generate_openapi.pyif routes change.
- Style: Prettier, ESLint.
- Types: Strict TypeScript. No
any. - API: Use the generated SDK (
@/lib/api/generated) for all backend calls. Do not usefetchdirectly. - State: Use
stores/app-store.tsxand domain-specific hooks.
- New Config: Add to
backend/config/schema/, update.env.example. - New Endpoint: Add router in
backend/api/routers/, include inbackend/api/app.py, regen SDK. - Job Cancellation: Handled by
CancellationServiceinbackend/domain/pipeline/cancellation.py.
Ref: README.md, CONTRIBUTING.md, CLAUDE.md