vllm-project · rafvasq · Jun 2, 2025 · May 30, 2025 · May 30, 2025 · May 30, 2025
@@ -71,10 +71,6 @@ instance/
 # Scrapy stuff:
 .scrapy
 
-# Sphinx documentation
-docs/_build/
-docs/source/getting_started/examples/
-
 # PyBuilder
 .pybuilder/
 target/
@@ -144,6 +140,7 @@ venv.bak/
 
 # mkdocs documentation
 /site
+docs/examples
 
 # mypy
 .mypy_cache/

@@ -8,11 +8,8 @@ build:
   tools:
     python: "3.12"
 
-sphinx:
-  configuration: docs/source/conf.py
-  fail_on_warning: true
-# If using Sphinx, optionally build your docs in additional formats such as PDF
-formats: []
+mkdocs:
+  configuration: mkdocs.yaml
 
 # Optionally declare the Python requirements required to build your docs
 python:

@@ -0,0 +1,28 @@
+nav:
+  - Home:
+    - vLLM Spyre Plugin: README.md
+    - Getting Started:
+        - Installation: getting_started/installation.md
+    - Deploying:
+        - Docker: deploying/docker.md
+        - Kubernetes: deploying/k8s.md
+    - Examples:
+      - Offline Inference: examples/offline_inference
+      - Other: examples/other
+    - User Guide:
+        - Configuration: user_guide/configuration.md
+        - Environment Variables: user_guide/env_vars.md
+        - Supported Features: user_guide/supported_features.md
+        - Supported Models: user_guide/supported_models.md
+    - Developer Guide:
+      - Contributing: contributing/README.md
+
+  - Getting Started:
+      - Installation: getting_started/installation.md
+  - User Guide:
+      - Configuration: user_guide/configuration.md
+      - Environment Variables: user_guide/env_vars.md
+      - Supported Features: user_guide/supported_features.md
+      - Supported Models: user_guide/supported_models.md
+  - Developer Guide:
+    - Contributing: contributing/README.md
@@ -1,22 +1,18 @@
-# vLLM Spyre Plugin docs
+# Welcome to the vLLM Spyre Plugin
 
-Live doc: [vllm-spyre.readthedocs.io](https://vllm-spyre.readthedocs.io)
+<p style="text-align:center">
+<script async defer src="https://buttons.github.io/buttons.js"></script>
+<a class="github-button" href="https://github.com/vllm-project/vllm-spyre" data-show-count="true" data-size="large" aria-label="Star">Star</a>
+<a class="github-button" href="https://github.com/vllm-project/vllm-spyre/subscription" data-icon="octicon-eye" data-size="large" aria-label="Watch">Watch</a>
+<a class="github-button" href="https://github.com/vllm-project/vllm-spyre/fork" data-icon="octicon-repo-forked" data-size="large" aria-label="Fork">Fork</a>
+</p>
 
-## Build the docs
+**IBM Spyre** is the first production-grade Artificial Intelligence Unit (AIU) accelerator born out of the IBM Research AIU family, and is part of a long-term strategy of developing novel architectures and full-stack technology solutions for the emerging space of generative AI. Spyre builds on the foundation of IBM’s internal AIU research and delivers a scalable, efficient architecture for accelerating AI in enterprise environments.
 
-```bash
-# Install dependencies.
-pip install -r requirements-docs.txt
+The vLLM Spyre plugin (`vllm-spyre`) is a dedicated backend extension that enables seamless integration of IBM Spyre Accelerator with vLLM. It follows the architecture described in [vLLM's Plugin System](https://docs.vllm.ai/en/latest/design/plugin_system.html), making it easy to integrate IBM's advanced AI acceleration into existing vLLM workflows.
 
-# Build the docs.
-make clean
-make html
-```
+For more information, check out the following:
 
-## Open the docs with your browser
-
-```bash
-python -m http.server -d _build/html/
-```
-
-Launch your browser and open [localhost:8000](http://localhost:8000/).
+- 📚 [Meet the IBM Artificial Intelligence Unit](https://research.ibm.com/blog/ibm-artificial-intelligence-unit-aiu)
+- 📽️ [AI Accelerators: Transforming Scalability & Model Efficiency](https://www.youtube.com/watch?v=KX0qBM-ByAg)
+- 🚀 [Spyre Accelerator for IBM Z](https://research.ibm.com/blog/spyre-for-z)
@@ -0,0 +1,161 @@
+# Contributing to vLLM Spyre
+
+Thank you for your interest in contributing to the Spyre plugin for vLLM! There are several ways you can contribute:
+
+- Identify and report any issues or bugs.
+- Suggest or implement new features.
+- Improve documentation or contribute a how-to guide.
+
+## Issues
+
+If you encounter a bug or have a feature request, please search [existing issues](https://github.com/vllm-project/vllm-spyre/issues?q=is%3Aissue) first to see if it has already been reported. If not, please [create a new issue](https://github.com/vllm-project/vllm-spyre/issues/new/choose), providing as much relevant information as possible.
+
+You can also reach out for support in the `#sig-spyre` channel in the [vLLM Slack](https://inviter.co/vllm-slack) workspace.
+
+## Developing
+
+### Building the docs with MkDocs
+
+#### Install MkDocs and Plugins
+
+Install MkDocs along with the [plugins](https://github.com/vllm-project/vllm-spyre/blob/main/mkdocs.yaml) used in the vLLM Spyre documentation.
+
+```bash
+pip install -r docs/requirements-docs.txt
+```
+
+!!! note
+    Ensure that your Python version is compatible with the plugins (e.g., `mkdocs-awesome-nav` requires Python 3.10+)
+
+#### Start the Development Server
+
+MkDocs comes with a built-in dev-server that lets you preview your documentation as you work on it.
+
+Make sure you're in the same directory as the `mkdocs.yml` configuration file in the `vllm-spyre` repository, and then start the server by running the `mkdocs serve` command:
+
+```bash
+mkdocs serve
+```
+
+Example output:
+
+```console
+INFO    -  Documentation built in 106.83 seconds
+INFO    -  [22:02:02] Watching paths for changes: 'docs', 'mkdocs.yaml'
+INFO    -  [22:02:02] Serving on http://127.0.0.1:8000/
+```
+
+#### View in Your Browser
+
+Open up [http://127.0.0.1:8000/](http://127.0.0.1:8000/) in your browser to see a live preview:.
+
+#### Learn More
+
+For additional features and advanced configurations, refer to the official [MkDocs Documentation](https://www.mkdocs.org/).
+
+## Testing
+
+### Testing Locally on CPU (No Spyre card)
+
+!!! tip
+    `xgrammar` is automatically installed on `x86_64` systems.
+
+Install `xgrammar` (only for `arm64` systems):
+
+```sh
+uv pip install xgrammar==0.1.19
+``` 
+
+Optionally, download the `JackFram/llama-160m` model:
+
+```sh
+python -c "from transformers import pipeline; pipeline('text-generation', model='JackFram/llama-160m')"
+```
+
+!!! caution
+    The Hugging Face API download does **not** work on `arm64`.
+
+By default, the model is saved to `.cache/huggingface/hub/models--JackFram--llama-160m`.
+
+Then, source the environment variables:
+
+```sh
+source _local_envs_for_test.sh
+```
+
+Optionally, install development dependencies:
+
+```sh
+uv pip install --group dev
+```
+
+Now, you can run the tests:
+
+```sh
+python -m pytest -v -x tests -m "v1 and cpu and e2e"
+```
+
+Here is a list of `pytest` markers you can use to filter them:
+
+```python
+--8<-- "pyproject.toml:test-markers-definition"
+```
+
+### Testing Continuous Batching
+
+!!! attention
+    Continuous batching currently requires the custom installation described below until the FMS custom branch is merged to main.
+
+After completing the setup steps above, install custom FMS branch to enable support for continuous batching:
+
+```sh
+uv pip install git+https://github.com/foundation-model-stack/foundation-model-stack.git@paged_attn_mock --force-reinstall
+```
+
+Then, run the continuous batching tests:
+
+```sh
+python -m pytest -v -x tests/e2e -m cb
+```
+
+## Pull Requests
+
+### Linting
+
+When submitting a PR, please make sure your code passes all linting checks. You can install the linting requirements using either `uv` or `pip`.
+
+Using `uv`:
+
+```sh
+uv sync --frozen --group lint --active --inexact
+```
+
+Using `pip`:
+
+```sh
+uv pip compile --group lint > requirements-lint.txt
+pip install -r requirements-lint.txt
+```
+
+After installing the requirements, run the formatting script:
+
+```sh
+bash format.sh
+```
+
+Then, make sure to commit any changes made by the formatter:
+
+```sh
+git add .
+git commit -s -m "Apply linting and formatting"
+```
+
+### DCO and Signed-off-by
+
+When contributing, you must agree to the [DCO](https://github.com/vllm-project/vllm-spyre/blob/main/DCO).Commits must include a `Signed-off-by:` header which certifies agreement with the terms of the DCO.
+
+Using `-s` with `git commit` will automatically add this header.
+
+## License
+
+See <gh-file:LICENSE>.
@@ -8,17 +8,15 @@ TODO: Add section on RHOAI officially supported images, once they exist
 
 Base images containing the driver stack for IBM Spyre accelerators are available from the [ibm-aiu](https://quay.io/repository/ibm-aiu/base?tab=tags) organization on Quay. This includes the `torch_sendnn` package, which is required for using torch with Spyre cards.
 
-:::{attention}
-These images contain an install of the `torch` package. The specific version installed is guaranteed to be compatible with `torch_sendnn`. Overwriting this install with a different version of `torch` may cause issues.
-:::
+!!! attention
+    These images contain an install of the `torch` package. The specific version installed is guaranteed to be compatible with `torch_sendnn`. Overwriting this install with a different version of `torch` may cause issues.
 
 ## Using community built images
 
-Community maintained images are also [available on quay](https://quay.io/repository/ibm-aiu/vllm-spyre?tab=tags), the latest x86 build is `quay.io/ibm-aiu/vllm-spyre:latest.amd64`.
+Community maintained images are also [available on Quay](https://quay.io/repository/ibm-aiu/vllm-spyre?tab=tags), the latest x86 build is `quay.io/ibm-aiu/vllm-spyre:latest.amd64`.
 
-:::{caution}
-These images are provided as a reference and come with no support guarantees.
-:::
+!!! caution
+    These images are provided as a reference and come with no support guarantees.
 
 ## Building vLLM Spyre's Docker Image from Source
 
@@ -28,9 +26,8 @@ You can build and run vLLM Spyre from source via the provided <gh-file:docker/Do
 DOCKER_BUILDKIT=1 docker build . --target release --tag vllm/vllm-spyre --file docker/Dockerfile.amd64
 ```
 
-:::{note}
-This Dockerfile currently only supports the x86 platform
-:::
+!!! note
+    This Dockerfile currently only supports the x86 platform
 
 ## Running vLLM Spyre in a Docker Container
 

@@ -4,13 +4,12 @@ The vLLM Documentation on [Deploying with Kubernetes](https://docs.vllm.ai/en/la
 
 ## Deploying on Spyre Accelerators
 
-:::{note}
-**Prerequisite** Ensure that you have a running Kubernetes cluster with Spyre accelerators.
-:::
+!!! note
+    **Prerequisite**: Ensure that you have a running Kubernetes cluster with Spyre accelerators.
 
 <!-- TODO: Link to public docs for cluster setup -->
 
-1. Create PVCs and secrets for vLLM. These are all optional.
+1. (Optional) Create PVCs and secrets for vLLM.
 
       ```yaml
       apiVersion: v1

@@ -0,0 +1,47 @@
+# Installation
+
+We use the [uv](https://docs.astral.sh/uv/) package manager to manage the
+installation of the plugin and its dependencies. `uv` provides advanced
+dependency resolution which is required to properly install dependencies like
+`vllm` without overwriting critical dependencies like `torch`.
+
+First, clone the `vllm-spyre` repo:
+
+```sh
+git clone https://github.com/vllm-project/vllm-spyre.git
+cd vllm-spyre
+```
+
+Then, install `uv`:
+
+```sh
+pip install uv
+```
+
+Now, create and activate a new [venv](https://docs.astral.sh/uv/pip/environments/):
+
+```sh
+uv venv --python 3.12 --seed .venv
+source .venv/bin/activate
+```
+
+To install `vllm-spyre` locally with development dependencies, use the following command:
+
+```sh
+uv sync --frozen --active --inexact
+```
+
+To include optional linting dependencies, include `--group lint`:
+
+```sh
+uv sync --frozen --active --inexact --group lint
+```
+
+!!! tip
+    The `dev` group (i.e. `--group dev`) is enabled by default.
+
+Finally, the `torch` is needed to run examples and tests. If it is not already installed, install it using `pip`:
+
+```sh
+pip install torch==2.7.0
+```