- Root holds
README.md; code is grouped by domain:infra/(Terraform),services/(r1-inference/,rag-api/),frontend/(web/), and.github/workflows/for CI. Keep ingestion assets indata/(not committed) and architecture notes indocs/. - Place Python app code under
services/*/app/; split FastAPI routers, model loaders, and utilities into clear submodules. Mirror this layout in tests for fast lookup.
- Install deps per service:
cd services/rag-api && python -m venv .venv && source .venv/bin/activate && pip install -r requirements.txt(repeat forr1-inference). - Format/lint:
make lint(black, isort, ruff) or run tools directly inside each service. - Tests:
make testat repo root (aggregates pytest) orpytestinside a specific service. Mark GPU-only tests with-m gpu. - Containers:
docker build -t deepseek-r1 services/r1-inferenceanddocker build -t rag-api services/rag-api; tag with git SHA before pushing to ECR. Local check:docker run --gpus all -p 8000:8000 deepseek-r1then POST to/generate.
- Python: 4-space indent, type hints for public functions, prefer pure helpers and small modules.
- Naming: snake_case for modules/functions, UpperCamelCase for classes, SCREAMING_SNAKE for constants. Route paths kebab-case; env vars upper snake case (e.g.,
MODEL_NAME). - Use black + isort for formatting and ruff for linting; enable pre-commit hooks when available.
- Framework: pytest. Put tests in
services/*/tests/matching package paths. Name filestest_<module>.pyand group withTest*classes when helpful. - Cover prompt shaping, RAG retrieval fallbacks, and GPU-specific paths. Use fixtures for sample docs/embeddings; avoid checking large artifacts into git. Record expected curl examples for API routes.
- Commits are imperative; Conventional Commit prefixes (
feat,fix,chore,docs,refactor,test) encouraged for CI tagging. - PRs: concise description, test evidence (
pytest/make testoutput), linked issues/ADRs, and infra impacts (Terraform plan, new env vars, AWS resources). Include curl snippets or screenshots for API/UI changes and note rollout steps (migration, backfill, cache warm-up).
- Never commit secrets; use
.env.exampleand store real values in AWS Secrets Manager/SSM. Keep model caches and data exports out of git via.gitignore. - Expose
/generateonly to internal networks; front with ALB/WAF. Add structured logging (request id, user id, latency, token counts) and redact prompts with PII.