Skip to content

Commit 5e8a8ed

Browse files
maxdebayserrafvasq
andauthored
📜 Add documentation and diagrams on the plugin architecture (#530)
# Description Add documentation on how the plugin fits into the vLLM architecture. --------- Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Maximilien de Bayser <[email protected]> Co-authored-by: Rafael Vasquez <[email protected]>
1 parent f6a83ce commit 5e8a8ed

File tree

6 files changed

+51
-1
lines changed

6 files changed

+51
-1
lines changed

docs/.nav.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ nav:
1818
- Supported Models: user_guide/supported_models.md
1919
- Developer Guide:
2020
- Contributing: contributing/README.md
21+
- Architecture: contributing/architecture.md
2122
- Continuous Batching:
2223
- Overview: contributing/continuous_batching/overview.md
2324
- Tests:

docs/contributing/architecture.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# Plugin Architecture
2+
3+
The Spyre plugin extends or replaces three main components in vLLM:
4+
5+
1. Scheduler
6+
2. Model worker and model runner
7+
3. Modeling code
8+
9+
To better understand these modifications, it's helpful to
10+
consider the state of the native vllm for GPU architecture.
11+
12+
![vLLM architecture](images/vllm_v1.svg)
13+
14+
The API server, the engine core, and the workers live in
15+
different processes. All three refer to the platform API for backend
16+
specific concerns.
17+
18+
In vLLM-Spyre, we implement a platform API that is
19+
loaded at the vLLM startup time and bootstraps all other components.
20+
21+
![vLLM Spyre architecture](images/vllm_v1_spyre.svg)
22+
23+
As we can see in the diagram, the plugin mainly modifies the engine core
24+
and worker processes. The platform API includes request validation hooks
25+
that the API server invokes to ensure that the requests
26+
can be handled by the backend.
27+
28+
In the engine core, we customize the scheduler to handle the constraints
29+
of static batching and continuous batching.
30+
31+
The changes are broader in the worker process. Most of the main
32+
classes have Spyre-specific implementations. From the vLLM code, we mainly
33+
reuse the sampling code (including logits processing) and the pooling
34+
code for non-generative use cases.
35+
36+
We provide model runners for three cases: static batching, continuous batching and
37+
pooling. The pooling model runner is very similar to the static batching one,
38+
except that it does pooling instead of sampling and
39+
uses the `transformers` modeling code instead of the `foundation-model-stack`
40+
code.

docs/contributing/images/vllm_v1.svg

Lines changed: 4 additions & 0 deletions
Loading

docs/contributing/images/vllm_v1_spyre.svg

Lines changed: 4 additions & 0 deletions
Loading

format.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ echo 'vLLM mypy: Done'
118118
# https://github.com/codespell-project/codespell/issues/1915
119119
# Avoiding the "./" prefix and using "/**" globs for directories appears to solve the problem
120120
CODESPELL_EXCLUDES=(
121-
'--skip' 'tests/prompts/**,./benchmarks/sonnet.txt,*tests/lora/data/**,build/**'
121+
'--skip' 'tests/prompts/**,./benchmarks/sonnet.txt,*tests/lora/data/**,build/**,*.svg'
122122
)
123123

124124
# check spelling of specified files

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,7 @@ exclude = [
125125

126126
[tool.codespell]
127127
ignore-words-list = "dout, te, indicies, subtile, ElementE"
128+
skip = "*.svg"
128129
#skip = "./tests/models/fixtures,./tests/prompts,./benchmarks/sonnet.txt,./tests/lora/data,./build"
129130

130131
[tool.isort]

0 commit comments

Comments
 (0)