Skip to content

Commit 3a7cc4b

Browse files
authored
Documents a bit CB script and tests (#300)
This PR introduces a brief overview on how to debug and test the continuous batching functionality in vLLM. It pinpoints the main testing functions and script for inference with continuous batching. --------- Signed-off-by: Sophie du Couédic <[email protected]>
1 parent 1d13d62 commit 3a7cc4b

File tree

2 files changed

+123
-0
lines changed

2 files changed

+123
-0
lines changed

docs/.nav.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ nav:
1818
- Developer Guide:
1919
- Contributing: contributing/README.md
2020
- Continuous Batching:
21+
- Overview: contributing/continuous_batching/overview.md
2122
- Tests:
2223
- Output Tests: contributing/continuous_batching/tests/output_tests.md
2324
- Scheduler Steps Tests: contributing/continuous_batching/tests/scheduler_steps_tests.md
@@ -37,6 +38,7 @@ nav:
3738
- Developer Guide:
3839
- Contributing: contributing/README.md
3940
- Continuous Batching:
41+
- Overview: contributing/continuous_batching/overview.md
4042
- Tests:
4143
- Output Tests: contributing/continuous_batching/tests/output_tests.md
4244
- Scheduler Steps Tests: contributing/continuous_batching/tests/scheduler_steps_tests.md
Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
# Continuous Batching tests / inference scripts in vLLM
2+
3+
Brief overview of what has been implemented so far in VLLM to test / debug continuous batching
4+
5+
## Inference script
6+
7+
* **File paths:**
8+
* `examples/offline_inference/cb_spyre_inference.py`
9+
* `examples/offline_inference/long_context.py`
10+
* **Purpose:** Debugging (ie. using manual execution)
11+
12+
### Description
13+
* Runs inference on a set of prompts with continuous batching enabled (number of prompts is parametrizable)
14+
* Prints the generated text for each sequence.
15+
* All the requested sequences are defined in the beginning, there is no requests joining the waiting queue while the decoding of some other request has already started.
16+
* The exact sequence of prefill and decode steps depends on the parameter values `max_num_seqs`, `num-prompts`, `max-tokens`.
17+
* If `--compare-with-CPU` is set, then the output text is compared to the one of hugging face, running on CPU. Note that here the logprobs are not compared, only tokens.
18+
19+
### Parametrization
20+
For `cb_spyre_inference.py`
21+
22+
* `--model`: the model
23+
* `--max_model_len`: maximum length of a sequence (padded prompt plus decoded tokens) (cannot exceed model size)
24+
* `--max_num_seqs`: Max number of sequences processed in a single iteration (decode batch size)
25+
* `--tp`: Tensor parallelism (number of Spyre cards)
26+
* `--num-prompts`: Total number of requested prompts
27+
* `--max-tokens`: Number of tokens generated for each requested sequence
28+
* `--compare-with-CPU`: If set, compare the text output with CPU version running with Hugging Face instead of vLLM
29+
30+
For `long_context.py`: the same parameters, but with some differences:
31+
32+
* `--max_prompt_len`: max lengths of prompts (prompts will have length up to `max_prompt_len`)
33+
* doesn't allow to specify `--max-tokens`: number of tokens set automatically given `max_model_len` and prompts lengths
34+
35+
## CB tests through unit tests
36+
37+
!!! abstract "In Short"
38+
See the detailed description of the individual unit tests for continuous batching in their respective files directly.
39+
40+
* [Output Tests](tests/output_tests.md): Check the correctness of end output logits/tokens of sequences ran with continuous batching enabled
41+
* [Scheduler Steps Tests](tests/scheduler_steps_tests.md): Check the correctness of the step-by-step execution of continuous batching for different scenarios of prompt lengths and requested tokens
42+
* [Other Tests](tests/other_tests.md): Other tests verifying the various behaviours of vLLM, when running with continuous batching enabled
43+
44+
* **Purpose:** Automated execution to verify that a specific behaviour acts as expected (passing/failing)
45+
46+
* **Files paths:**
47+
* Output Tests: `vllm-spyre/tests/e2e/test_spyre_basic.py`
48+
* Scheduler Steps Tests: `vllm-spyre/tests/e2e/test_spyre_cb_scheduler_steps.py`
49+
* Other Tests: various files including `vllm-spyre/tests/e2e/test_spyre_cb.py`
50+
51+
<!-- markdownlint-disable MD031 MD046 -->
52+
### Usage (when running locally)
53+
54+
#### Commands
55+
56+
# Runs all the tests
57+
python -m pytest -svx -m "spyre and cb" --forked tests
58+
59+
# Runs specific test file
60+
python -m pytest -svx -m "spyre and cb" --forked tests/e2e/test_spyre_cb_scheduler_steps.py
61+
62+
# Runs specific test function
63+
python -m pytest -svx -m "spyre and cb" --forked tests/e2e/test_spyre_basic.py::test_output
64+
65+
<!-- markdownlint-enable MD031 MD046 -->
66+
67+
#### Parameters description
68+
* `-x` option: stops the execution as soon as a test fails
69+
* `-s` option: show all the print statements in the code
70+
* `-v` option: verbose mode, make the test output more detailed: show name of each test function and whether it passed, failed or was skipped
71+
* `--forked` option: isolates the tests and avoid having one test crashing impacting the other tests
72+
* `-m "spyre and cb"`: runs the tests with configurations marked as "spyre" and "cb" only
73+
74+
!!! tip
75+
To run a test with a different model than the default `ibm-ai-platform/micro-g3.3-8b-instruct-1b`, you can run the test with `VLLM_SPYRE_TEST_MODEL_LIST` environment variable set to the target model, for example:
76+
```bash
77+
VLLM_SPYRE_TEST_MODEL_LIST='tiny-granite-3.2-8b' python -m pytest -svx -m "spyre and cb" --forked tests/e2e/test_spyre_cb.py
78+
```
79+
80+
### Description
81+
82+
Unit tests are designed for automated and systematic execution to verify that CB behaves as expected for different scenarios. For each scenario (i.e. configuration of parameters), the test either passes or fails. When a test suite fails, identifying which specific test case failed is often more informative than the failure message itself. Below is a brief description of the different unit tests targeting CB. The description can also be found in the docstring of the different test functions:
83+
84+
!!! caution
85+
When adding new parametrization to a test, the parameters are typically combinatorial and number of executed tests can increase really fast. For example the following test function will run 2 x 2 = 4 different scenarios in total:
86+
```python
87+
@pytest.mark.parametrize("model", ["micro-g3.3-8b-instruct-1b", "granite-3.3-8b-instruct"])
88+
@pytest.mark.parametrize("max_tokens", [[10, 20], [60, 78]])
89+
def test_function(model: str, max_tokens: list[int]):
90+
...
91+
```
92+
93+
#### Output Tests
94+
95+
See [Output Tests](tests/output_tests.md)
96+
97+
Output tests checks the correctness of the output of CB on a set of prompts. For now, the number of prompts and the prompts themself are hardcoded, as well as the max requested tokens per prompt (constant and set to 20). The output from vllm is compared to this of Hugging Face on CPU.
98+
99+
!!! note inline end
100+
This applies for sendnn backend, on CPU the tokens need to additionally be exactly the same for the test to pass
101+
* The test passes if: the logprobs of HF on CPU and vLLM (on Spyre or CPU depending on the backend) are compared, and the test passes only if the pairwise relative differences of the values are all below a threshold: `math.isclose(hf_logprob, vllm_logprob, rel_tol=0.35)`. Otherwise it fails. There is no logic that takes into account the fact that the tokens might becomes different at some point, making the logits diverging.
102+
103+
#### Scheduler Steps Tests
104+
105+
See [Scheduler Steps Tests](tests/scheduler_steps_tests.md)
106+
107+
!!! Question
108+
For these tests, the final output is not checked, only the step-by-step execution correctness. Would it make sense to have output validation though?
109+
110+
Checking the final output correctness alone is not enough to ensure that CB is correctly implemented (otherwise how can we differentiate with static batching for example). So the scheduler steps tests are meant to check the correctness of the step-by-step execution of continuous batching. It does so by comparing, at every engine step (i.e. prefill or decode iteration), a bunch of attributes. This is allows a finer testing of the padding and scheduling implementation.
111+
112+
* **Checked attributes at each step:**
113+
* `tkv`: after each step, the tkv is compared against the expected tkv value for that step
114+
* `waiting`, `running`, `request_outputs`, `finished_requests` not really relevant in a compiler point of view, but after each iteration, we check that the list of running and waiting requests and those that have finished are correct. this tests the scheduler correctness.
115+
* (waiting to be merged, PR #261): `n_reserved_blocks` and `n_used_blocks`
116+
117+
#### Other Tests
118+
119+
See [Other Tests](tests/other_tests.md)
120+
121+
Most of the other tests primarily verify the correctness of various vLLM Spyre's plugin behaviors, such as launching the online server or enforcing scheduler constraints. While they don't always directly target the correctness of continuous batching, they ensure that the system functions as expected when continuous batching is enabled.

0 commit comments

Comments
 (0)