This document outlines the release process for Llama Stack, providing predictability for the community on feature delivery timelines and release expectations.
Llama Stack follows Semantic Versioning with three release streams:
| Release Type | Cadence | Description |
|---|---|---|
| Major (X.0.0) | Every 6-8 months | Breaking changes, major new features, architectural changes |
| Minor (0.Y.0) | Monthly | New features, non-breaking API additions, significant improvements |
| Patch (0.0.Z) | Weekly | Bug fixes, security patches, documentation updates |
Releases follow the X.Y.Z pattern:
- X (Major): Incremented for breaking changes or significant architectural updates
- Y (Minor): Incremented for new features and non-breaking enhancements
- Z (Patch): Incremented for bug fixes and minor improvements
For minor and major releases, release candidates (RC) are published before the final release:
- Format:
vX.Y.ZrcN(e.g.,v0.4.0rc1,v0.4.0rc2) - Python RC packages are published to test.pypi for community testing
- Multiple RCs may be issued until the release is stable
main: Active development branch, always contains the latest coderelease-X.Y.x: Release branches for each minor version (e.g.,release-0.3.x,release-0.4.x)- Patch releases are made from release branches
- Critical fixes are backported from
mainto active release branches using Mergify
- Issues only: Add only issues to milestones, not PRs (avoids duplicate tracking)
- Milestone creation: Create milestones for each planned minor and major release
- Small fixes: Quick-landing PRs for small fixes don't require milestone tracking
A version is released when:
- All issues in the corresponding milestone are completed, OR
- Remaining issues are moved to a future milestone with documented rationale
- Triagers manage milestones and prioritize issues
- Discussions happen in the
#triageDiscord channel - Priority decisions are reviewed in community calls
Each release has a designated Release Owner from the CODEOWNERS group who is responsible for:
- Creating a dedicated Discord thread in
#releasechannel - Coordinating testing activities
- Managing the release timeline
- Publishing release artifacts
- Announcing the release
Testing requirements scale with release type:
- Rely primarily on automated CI tests
- Quick turnaround for critical fixes
- Manual verification only for specific fix validation
- Automated CI tests must pass
- Manual feature testing for new functionality
- Documentation verification
- Community testing window: 1 week
- Release candidates published for community validation
- Comprehensive automated test suite
- Scheduled testing period with predefined test plans
- Cross-provider compatibility testing
- Performance benchmarking
- Community testing window: 2-3 weeks
- Multiple release candidates as needed
For each release, the Release Owner should complete:
- Create release-specific thread in
#releasesDiscord channel - Complete the technical release steps below
- Generate release notes
- Announce in
#announcementsDiscord channel
Pre-release on release-0.4.x:
Backports are handled automatically by Mergify — patch releases ship whatever has already been backported to the release branch. No manual cherry-picking needed.
- Run the Prepare release workflow:
- Input
version:0.4.5 - Input
release_branch:release-0.4.x - This commits
fallback_versionandllama-stack-clientpin updates directly to the release branch
- Input
Release:
- Create GitHub release: tag
v0.4.5, targetrelease-0.4.x - Verify all 4 packages published:
Post-release (automated):
The following steps are handled automatically by the Post-release automation workflow, which triggers on release: published:
- Tags
mainwithv0.4.6-dev(next dev tag) - Commits
fallback_versionbump to"0.4.6.dev0"directly tomain - Commits the npm lockfile update directly to
release-0.4.x
All of the above, plus:
- Create
release-0.5.xbranch offmain - Ensure the release branch has the setuptools-scm config in both
pyproject.tomlfiles (dynamic = ["version"],[tool.setuptools_scm], etc.)
Each release includes:
- PyPI package:
llama-stackandllama-stack-client - npm package:
llama-stack-client - Docker images: Distribution images on Docker Hub
- GitHub Release: Tagged release with release notes
- Documentation: Updated docs at https://llamastack.github.io
See CONTRIBUTING.md for general contribution guidelines.
Llama Stack actively maintains the last 2 stable minor releases.
- Bug fixes: Critical bugs are backported to maintained release branches
- Security patches: Security vulnerabilities are patched in maintained releases
- Patch releases (Z-stream): Maintained releases receive regular patch releases
| Release | Status | Notes |
|---|---|---|
| Current minor (0.Y.0) | ✅ Actively maintained | Bug fixes and security patches |
| Previous minor (0.Y-1.0) | ✅ Maintained | Bug fixes and security patches |
| Older releases | ❌ Unmaintained | No backports; upgrade recommended |
If the current release is v0.4.x:
v0.4.x— Actively maintained (current)v0.3.x— Maintained (bug fixes only)v0.2.xand earlier — Unmaintained
Users on unmaintained versions are encouraged to upgrade to continue receiving fixes.
The unified workflow (.github/workflows/pypi.yml) builds and publishes all packages:
- Local packages (llama-stack, llama-stack-api): version comes from the git tag via
SETUPTOOLS_SCM_PRETEND_VERSION - External packages (llama-stack-client python/typescript): the workflow patches
pyproject.toml/_version.py/package.jsonat build time using the tag version viased/npm version fallback_versionis only used for nightly/dev builds and Docker — not for releases- The workflow always runs from
mainbut checks out the tag's commit for local packages
| Trigger | Version | Target |
|---|---|---|
release: published |
From tag (v0.4.5 → 0.4.5) |
pypi.org + npm |
schedule (nightly) |
{base}.dev{YYYYMMDD} (from dev tag or fallback) |
test.pypi.org |
workflow_dispatch dry_run=test-pypi |
{base}.dev{YYYYMMDD} or manual version input |
test.pypi.org |
workflow_dispatch dry_run=off |
Manual version input |
pypi.org + npm |
workflow_dispatch dry_run=build-only |
N/A | No publish |
Triggered via workflow_dispatch. Takes a version and release branch as input, then:
- Updates
fallback_versionto the release version in bothpyproject.tomlfiles - Updates
llama-stack-clientpins to==X.Y.Z - Opens a PR to the release branch
Triggered automatically after the pypi.yml workflow succeeds for a release event. Handles:
- Dev tag: Tags
mainwithvX.Y.(Z+1)-devso setuptools-scm can infer versions - Fallback bump: Commits
fallback_versionbump to the next.dev0directly tomain - npm lockfile: Opens a PR to the release branch updating the UI lockfile
The nightly build (in pypi.yml) derives its base version from git describe --tags --match 'v*', using the dev tag pushed by the post-release workflow. fallback_version in pyproject.toml serves as a safety net for builds without git history (e.g., source tarballs).
The llama-stack-client==X.Y.Z pin in pyproject.toml can't be satisfied until the client is published, but the client is published in the same workflow run. Options:
- Change the pin to
>=X.Y.Zor~=X.Yso it doesn't require an exact match that doesn't exist yet - Remove the pin from the release branch entirely and let the workflow handle compatibility
- Publish client packages first in a separate step, then update pins, then publish llama-stack
Right now the workflow computes the version separately and passes it via SETUPTOOLS_SCM_PRETEND_VERSION. With dev tags now on main, setuptools-scm can potentially infer versions natively, which would:
- Eliminate the
compute-versionstep entirely - Eliminate
fallback_versionmanagement (no more bumping it post-release) - Make
uv buildwork correctly locally without any env vars - Let setuptools-scm generate dev versions automatically (e.g.,
0.5.0.dev3+gabcdefbased on commits since last tag)
The llama-stack-client-python and llama-stack-client-typescript repos use static versions. The workflow patches them with sed at build time, which is fragile. If those repos adopted setuptools-scm (Python) or a similar scheme, the workflow could just set an env var instead of rewriting files.