Skip to content
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 1 addition & 4 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -71,10 +71,6 @@ instance/
# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/
docs/source/getting_started/examples/

# PyBuilder
.pybuilder/
target/
Expand Down Expand Up @@ -144,6 +140,7 @@ venv.bak/

# mkdocs documentation
/site
docs/examples

# mypy
.mypy_cache/
Expand Down
7 changes: 2 additions & 5 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,8 @@ build:
tools:
python: "3.12"

sphinx:
configuration: docs/source/conf.py
fail_on_warning: true
# If using Sphinx, optionally build your docs in additional formats such as PDF
formats: []
mkdocs:
configuration: mkdocs.yaml

# Optionally declare the Python requirements required to build your docs
python:
Expand Down
28 changes: 28 additions & 0 deletions docs/.nav.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
nav:
- Home:
- vLLM Spyre Plugin: README.md
- Getting Started:
- Installation: getting_started/installation.md
- Deploying:
- Docker: deploying/docker.md
- Kubernetes: deploying/k8s.md
- Examples:
- Offline Inference: examples/offline_inference
- Other: examples/other
- User Guide:
- Configuration: user_guide/configuration.md
- Environment Variables: user_guide/env_vars.md
- Supported Features: user_guide/supported_features.md
- Supported Models: user_guide/supported_models.md
- Developer Guide:
- Contributing: contributing/README.md

- Getting Started:
- Installation: getting_started/installation.md
- User Guide:
- Configuration: user_guide/configuration.md
- Environment Variables: user_guide/env_vars.md
- Supported Features: user_guide/supported_features.md
- Supported Models: user_guide/supported_models.md
- Developer Guide:
- Contributing: contributing/README.md
20 changes: 0 additions & 20 deletions docs/Makefile

This file was deleted.

30 changes: 13 additions & 17 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,18 @@
# vLLM Spyre Plugin docs
# Welcome to the vLLM Spyre Plugin

Live doc: [vllm-spyre.readthedocs.io](https://vllm-spyre.readthedocs.io)
<p style="text-align:center">
<script async defer src="https://buttons.github.io/buttons.js"></script>
<a class="github-button" href="https://github.com/vllm-project/vllm-spyre" data-show-count="true" data-size="large" aria-label="Star">Star</a>
<a class="github-button" href="https://github.com/vllm-project/vllm-spyre/subscription" data-icon="octicon-eye" data-size="large" aria-label="Watch">Watch</a>
<a class="github-button" href="https://github.com/vllm-project/vllm-spyre/fork" data-icon="octicon-repo-forked" data-size="large" aria-label="Fork">Fork</a>
</p>

## Build the docs
**IBM Spyre** is the first production-grade Artificial Intelligence Unit (AIU) accelerator born out of the IBM Research AIU family, and is part of a long-term strategy of developing novel architectures and full-stack technology solutions for the emerging space of generative AI. Spyre builds on the foundation of IBM’s internal AIU research and delivers a scalable, efficient architecture for accelerating AI in enterprise environments.

```bash
# Install dependencies.
pip install -r requirements-docs.txt
The vLLM Spyre plugin (`vllm-spyre`) is a dedicated backend extension that enables seamless integration of IBM Spyre Accelerator with vLLM. It follows the architecture described in [vLLM's Plugin System](https://docs.vllm.ai/en/latest/design/plugin_system.html), making it easy to integrate IBM's advanced AI acceleration into existing vLLM workflows.

# Build the docs.
make clean
make html
```
For more information, check out the following:

## Open the docs with your browser

```bash
python -m http.server -d _build/html/
```

Launch your browser and open [localhost:8000](http://localhost:8000/).
- 📚 [Meet the IBM Artificial Intelligence Unit](https://research.ibm.com/blog/ibm-artificial-intelligence-unit-aiu)
- 📽️ [AI Accelerators: Transforming Scalability & Model Efficiency](https://www.youtube.com/watch?v=KX0qBM-ByAg)
- 🚀 [Spyre Accelerator for IBM Z](https://research.ibm.com/blog/spyre-for-z)
161 changes: 161 additions & 0 deletions docs/contributing/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
# Contributing to vLLM Spyre

Thank you for your interest in contributing to the Spyre plugin for vLLM! There are several ways you can contribute:

- Identify and report any issues or bugs.
- Suggest or implement new features.
- Improve documentation or contribute a how-to guide.

## Issues

If you encounter a bug or have a feature request, please search [existing issues](https://github.com/vllm-project/vllm-spyre/issues?q=is%3Aissue) first to see if it has already been reported. If not, please [create a new issue](https://github.com/vllm-project/vllm-spyre/issues/new/choose), providing as much relevant information as possible.

You can also reach out for support in the `#sig-spyre` channel in the [vLLM Slack](https://inviter.co/vllm-slack) workspace.

## Developing

### Building the docs with MkDocs

#### Install MkDocs and Plugins

Install MkDocs along with the [plugins](https://github.com/vllm-project/vllm-spyre/blob/main/mkdocs.yaml) used in the vLLM Spyre documentation.

```bash
pip install -r docs/requirements-docs.txt
```

!!! note
Ensure that your Python version is compatible with the plugins (e.g., `mkdocs-awesome-nav` requires Python 3.10+)

#### Start the Development Server

MkDocs comes with a built-in dev-server that lets you preview your documentation as you work on it.

Make sure you're in the same directory as the `mkdocs.yml` configuration file in the `vllm-spyre` repository, and then start the server by running the `mkdocs serve` command:

```bash
mkdocs serve
```

Example output:

```console
INFO - Documentation built in 106.83 seconds
INFO - [22:02:02] Watching paths for changes: 'docs', 'mkdocs.yaml'
INFO - [22:02:02] Serving on http://127.0.0.1:8000/
```

#### View in Your Browser

Open up [http://127.0.0.1:8000/](http://127.0.0.1:8000/) in your browser to see a live preview:.

#### Learn More

For additional features and advanced configurations, refer to the official [MkDocs Documentation](https://www.mkdocs.org/).

## Testing

### Testing Locally on CPU (No Spyre card)

!!! tip
`xgrammar` is automatically installed on `x86_64` systems.

Install `xgrammar` (only for `arm64` systems):

```sh
uv pip install xgrammar==0.1.19
```

Optionally, download the `JackFram/llama-160m` model:

```sh
python -c "from transformers import pipeline; pipeline('text-generation', model='JackFram/llama-160m')"
```

!!! caution
The Hugging Face API download does **not** work on `arm64`.

By default, the model is saved to `.cache/huggingface/hub/models--JackFram--llama-160m`.

Then, source the environment variables:

```sh
source _local_envs_for_test.sh
```

Optionally, install development dependencies:

```sh
uv pip install --group dev
```

Now, you can run the tests:

```sh
python -m pytest -v -x tests -m "v1 and cpu and e2e"
```

Here is a list of `pytest` markers you can use to filter them:

```python
--8<-- "pyproject.toml:test-markers-definition"
```

### Testing Continuous Batching

!!! attention
Continuous batching currently requires the custom installation described below until the FMS custom branch is merged to main.

After completing the setup steps above, install custom FMS branch to enable support for continuous batching:

```sh
uv pip install git+https://github.com/foundation-model-stack/foundation-model-stack.git@paged_attn_mock --force-reinstall
```

Then, run the continuous batching tests:

```sh
python -m pytest -v -x tests/e2e -m cb
```

## Pull Requests

### Linting

When submitting a PR, please make sure your code passes all linting checks. You can install the linting requirements using either `uv` or `pip`.

Using `uv`:

```sh
uv sync --frozen --group lint --active --inexact
```

Using `pip`:

```sh
uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
```

After installing the requirements, run the formatting script:

```sh
bash format.sh
```

Then, make sure to commit any changes made by the formatter:

```sh
git add .
git commit -s -m "Apply linting and formatting"
```

### DCO and Signed-off-by

When contributing, you must agree to the [DCO](https://github.com/vllm-project/vllm-spyre/blob/main/DCO).Commits must include a `Signed-off-by:` header which certifies agreement with the terms of the DCO.

Using `-s` with `git commit` will automatically add this header.

## License

See <gh-file:LICENSE>.
17 changes: 7 additions & 10 deletions docs/source/deploying/docker.md → docs/deploying/docker.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,17 +8,15 @@ TODO: Add section on RHOAI officially supported images, once they exist

Base images containing the driver stack for IBM Spyre accelerators are available from the [ibm-aiu](https://quay.io/repository/ibm-aiu/base?tab=tags) organization on Quay. This includes the `torch_sendnn` package, which is required for using torch with Spyre cards.

:::{attention}
These images contain an install of the `torch` package. The specific version installed is guaranteed to be compatible with `torch_sendnn`. Overwriting this install with a different version of `torch` may cause issues.
:::
!!! attention
These images contain an install of the `torch` package. The specific version installed is guaranteed to be compatible with `torch_sendnn`. Overwriting this install with a different version of `torch` may cause issues.

## Using community built images

Community maintained images are also [available on quay](https://quay.io/repository/ibm-aiu/vllm-spyre?tab=tags), the latest x86 build is `quay.io/ibm-aiu/vllm-spyre:latest.amd64`.
Community maintained images are also [available on Quay](https://quay.io/repository/ibm-aiu/vllm-spyre?tab=tags), the latest x86 build is `quay.io/ibm-aiu/vllm-spyre:latest.amd64`.

:::{caution}
These images are provided as a reference and come with no support guarantees.
:::
!!! caution
These images are provided as a reference and come with no support guarantees.

## Building vLLM Spyre's Docker Image from Source

Expand All @@ -28,9 +26,8 @@ You can build and run vLLM Spyre from source via the provided <gh-file:docker/Do
DOCKER_BUILDKIT=1 docker build . --target release --tag vllm/vllm-spyre --file docker/Dockerfile.amd64
```

:::{note}
This Dockerfile currently only supports the x86 platform
:::
!!! note
This Dockerfile currently only supports the x86 platform

## Running vLLM Spyre in a Docker Container

Expand Down
7 changes: 3 additions & 4 deletions docs/source/deploying/k8s.md → docs/deploying/k8s.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,12 @@ The vLLM Documentation on [Deploying with Kubernetes](https://docs.vllm.ai/en/la

## Deploying on Spyre Accelerators

:::{note}
**Prerequisite** Ensure that you have a running Kubernetes cluster with Spyre accelerators.
:::
!!! note
**Prerequisite**: Ensure that you have a running Kubernetes cluster with Spyre accelerators.

<!-- TODO: Link to public docs for cluster setup -->

1. Create PVCs and secrets for vLLM. These are all optional.
1. (Optional) Create PVCs and secrets for vLLM.

```yaml
apiVersion: v1
Expand Down
47 changes: 47 additions & 0 deletions docs/getting_started/installation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Installation

We use the [uv](https://docs.astral.sh/uv/) package manager to manage the
installation of the plugin and its dependencies. `uv` provides advanced
dependency resolution which is required to properly install dependencies like
`vllm` without overwriting critical dependencies like `torch`.

First, clone the `vllm-spyre` repo:

```sh
git clone https://github.com/vllm-project/vllm-spyre.git
cd vllm-spyre
```

Then, install `uv`:

```sh
pip install uv
```

Now, create and activate a new [venv](https://docs.astral.sh/uv/pip/environments/):

```sh
uv venv --python 3.12 --seed .venv
source .venv/bin/activate
```

To install `vllm-spyre` locally with development dependencies, use the following command:

```sh
uv sync --frozen --active --inexact
```

To include optional linting dependencies, include `--group lint`:

```sh
uv sync --frozen --active --inexact --group lint
```

!!! tip
The `dev` group (i.e. `--group dev`) is enabled by default.

Finally, the `torch` is needed to run examples and tests. If it is not already installed, install it using `pip`:

```sh
pip install torch==2.7.0
```
Loading
Loading