📜 Add documentation and diagrams on the plugin architecture (#530)

maxdebayser · rafvasq · web-flow · commit 5e8a8ed6c28f · 2025-10-16T10:02:47.000-03:00
# Description

Add documentation on how the plugin fits into the vLLM architecture.

---------

Signed-off-by: Max de Bayser &lt;mbayser@br.ibm.com&gt;
Signed-off-by: Maximilien de Bayser &lt;maxdebayser@gmail.com&gt;
Co-authored-by: Rafael Vasquez &lt;rafvasq21@gmail.com&gt;
diff --git a/docs/.nav.yml b/docs/.nav.yml
@@ -18,6 +18,7 @@ nav:
       - Supported Models: user_guide/supported_models.md
     - Developer Guide:
       - Contributing: contributing/README.md
+      - Architecture: contributing/architecture.md
       - Continuous Batching:
         - Overview: contributing/continuous_batching/overview.md
         - Tests:
diff --git a/docs/contributing/architecture.md b/docs/contributing/architecture.md
@@ -0,0 +1,40 @@
+# Plugin Architecture
+
+The Spyre plugin extends or replaces three main components in vLLM:
+
+1. Scheduler
+2. Model worker and model runner
+3. Modeling code
+
+To better understand these modifications, it's helpful to
+consider the state of the native vllm for GPU architecture.
+
+![vLLM architecture](images/vllm_v1.svg)
+
+The API server, the engine core, and the workers live in
+different processes. All three refer to the platform API for backend
+specific concerns.
+
+In vLLM-Spyre, we implement a platform API that is
+loaded at the vLLM startup time and bootstraps all other components.
+
+![vLLM Spyre architecture](images/vllm_v1_spyre.svg)
+
+As we can see in the diagram, the plugin mainly modifies the engine core
+and worker processes. The platform API includes request validation hooks
+that the API server invokes to ensure that the requests
+can be handled by the backend.
+
+In the engine core, we customize the scheduler to handle the constraints
+of static batching and continuous batching.
+
+The changes are broader in the worker process. Most of the main
+classes have Spyre-specific implementations. From the vLLM code, we mainly
+reuse the sampling code (including logits processing) and the pooling
+code for non-generative use cases.
+
+We provide model runners for three cases: static batching, continuous batching and
+pooling. The pooling model runner is very similar to the static batching one,
+except that it does pooling instead of sampling and
+uses the `transformers` modeling code instead of the `foundation-model-stack`
+code.
diff --git a/docs/contributing/images/vllm_v1.svg b/docs/contributing/images/vllm_v1.svg
diff --git a/docs/contributing/images/vllm_v1_spyre.svg b/docs/contributing/images/vllm_v1_spyre.svg
diff --git a/format.sh b/format.sh
@@ -118,7 +118,7 @@ echo 'vLLM mypy: Done'
 # https://github.com/codespell-project/codespell/issues/1915
 # Avoiding the "./" prefix and using "/**" globs for directories appears to solve the problem
 CODESPELL_EXCLUDES=(
-    '--skip' 'tests/prompts/**,./benchmarks/sonnet.txt,*tests/lora/data/**,build/**'
+    '--skip' 'tests/prompts/**,./benchmarks/sonnet.txt,*tests/lora/data/**,build/**,*.svg'
 )
 
 # check spelling of specified files
diff --git a/pyproject.toml b/pyproject.toml
@@ -125,6 +125,7 @@ exclude = [
 
 [tool.codespell]
 ignore-words-list = "dout, te, indicies, subtile, ElementE"
+skip = "*.svg"
 #skip = "./tests/models/fixtures,./tests/prompts,./benchmarks/sonnet.txt,./tests/lora/data,./build"
 
 [tool.isort]

Original file line number	Diff line number	Diff line change
`@@ -118,7 +118,7 @@ echo 'vLLM mypy: Done'`
`118`	`118`	`# https://github.com/codespell-project/codespell/issues/1915`
`119`	`119`	`# Avoiding the "./" prefix and using "/**" globs for directories appears to solve the problem`
`120`	`120`	`CODESPELL_EXCLUDES=(`
`121`		`- '--skip' 'tests/prompts/*,./benchmarks/sonnet.txt,tests/lora/data/,build/'`
	`121`	`+ '--skip' 'tests/prompts/*,./benchmarks/sonnet.txt,tests/lora/data/,build/,*.svg'`
`122`	`122`	`)`
`123`	`123`
`124`	`124`	`# check spelling of specified files`