This file provides guidance for AI coding agents working on this codebase. It complements CONTRIBUTING.md with agent-specific instructions. Human contributors should follow the conventions in CONTRIBUTING.md.
Llama Stack is an API server implementing the OpenAI Responses API, Chat Completions, Embeddings, and supporting APIs (files, vector stores, batches, eval, safety). It supports multiple inference backends (OpenAI, Azure, Bedrock, vLLM, Ollama, WatsonX, etc.) through a provider architecture.
src/llama_stack/ # Server implementation
core/ # Request routing, server, storage
providers/
inline/ # Built-in providers (responses, eval, vector_io, etc.)
remote/ # Remote provider adapters (OpenAI, Azure, vLLM, etc.)
registry/ # Provider registration specs
utils/ # Shared provider utilities (OpenAI mixin, MCP, etc.)
distributions/ # Distribution configs (starter, ci-tests, etc.)
src/llama_stack_api/ # API definitions, Pydantic models, FastAPI routes
tests/
unit/ # Unit tests
integration/ # Integration tests with recording/replay system
- Python 3.12 is required. Pre-commit hooks only work with 3.12.
- Use
uvfor all dependency management and script execution. - Run scripts with
uv run, never barepython3orpython. - Use standard library modules when possible.
- All function signatures must have type hints. Prefer Pydantic models for validation.
- Code must pass
mypy. Check the exclude list inpyproject.tomlfor known exceptions. - Use
def _function_namefor private functions. - Prefer explicit top-level imports over inline imports.
- Do not use exceptions as control flow.
- Comments must add value. Do not write filler comments that describe the next line.
- Add comments only to clarify surprising behavior that is not obvious from the code.
- Good variable naming and clear code organization matters more than comments.
- Do NOT remove existing comments unless they are factually wrong.
- Error messages must be prefixed with "Failed to ...".
- Use debug logging when appropriate via
from llama_stack.log import get_logger. - The pre-commit hook
Ensure 'llama_stack.log' usage for loggingenforces that all logging uses the project's logger, not the standard library directly.
- Always use
--signoff(-s) when creating commits. - Do not amend commits and force push during PR review. Use new commits instead.
- Use
git merge mainto update branches, notgit rebase. - Commit messages must use conventional commits format and full sentences, not bullet points.
- Merge
upstream/mainbefore pushing a branch to avoid CI failures from stale code.
- Unit tests:
./scripts/unit-tests.shoruv run pytest tests/unit/ -x --tb=short - Integration tests:
./scripts/integration-tests.shwith recording/replay system. Recordings are JSON files intests/integration/*/recordings/. When modifying code that changes request bodies sent to providers, recordings may need to be re-recorded. - Run pre-commit checks:
uv run pre-commit run --all-files
Each provider implements a protocol (e.g., Inference, Responses, VectorIO) and is
registered in src/llama_stack/providers/registry/. Provider specs include:
provider_type: e.g.,remote::openai,inline::builtinmodule: Python module pathconfig_class: Pydantic config class pathapi_dependencies: APIs this provider depends ondeprecation_warning: For deprecated providers (triggers runtime warning)toolgroup_id: For tool_runtime providers that auto-register tool groups
Configuration classes must use Pydantic Field with description parameters —
these generate the provider documentation automatically.
Distribution YAML files in src/llama_stack/distributions/ are partially auto-generated.
After changing provider configs, run:
uv run ./scripts/distro_codegen.py
uv run ./scripts/provider_codegen.pyDo not edit generated files in docs/docs/providers/ manually.
When modifying or extending APIs:
- Update models in
src/llama_stack_api/ - Regenerate OpenAPI specs:
uv run ./scripts/run_openapi_generator.sh - Check for breaking changes — the pre-commit hook
Check API spec for breaking changesenforces backward compatibility. - Include a test plan with a testing script and execution output in your PR description.
- Add the field to the Pydantic model in
src/llama_stack_api/ - Thread it through the provider protocol and implementation
- Update affected distribution configs if needed
- Regenerate specs and docs
- Add unit test cases covering the new parameter
Use the existing deprecation_warning field on InlineProviderSpec or RemoteProviderSpec.
Search for existing examples: grep -r "deprecation_warning" src/llama_stack/providers/registry/
Search the codebase for existing examples of the same pattern first.
Use grep to find how deprecation, validation, configuration, or aliasing is already
handled elsewhere.
When making code changes, check whether the following documentation needs updating:
ARCHITECTURE.md— system overview, request flow, provider architecture, API layer, storage, configuration, and test recording system- Module-level
README.mdfiles in key directories:src/llama_stack/README.md,src/llama_stack/core/README.mdsrc/llama_stack/core/server/README.md,src/llama_stack/core/storage/README.mdsrc/llama_stack/core/routing_tables/README.mdsrc/llama_stack/providers/README.md,src/llama_stack/providers/inline/README.mdsrc/llama_stack/providers/remote/README.md,src/llama_stack/providers/registry/README.mdsrc/llama_stack/providers/utils/README.md,src/llama_stack/providers/utils/inference/README.mdsrc/llama_stack/providers/inline/agents/README.mdsrc/llama_stack/providers/inline/tool_runtime/README.mdsrc/llama_stack/providers/remote/inference/README.mdsrc/llama_stack/distributions/README.mdscripts/README.md
tests/README.md,tests/unit/README.md,tests/integration/README.md
These files help LLMs and new contributors navigate the codebase. If your change adds, removes, or renames modules, providers, APIs, or storage backends, update the relevant documentation to stay in sync.