Skip to content

Conversation

@sducouedic
Copy link
Collaborator

@sducouedic sducouedic commented Jul 10, 2025

This PR introduces a brief overview on how to debug and test the continuous batching functionality in vLLM. It pinpoints the main testing functions and script for inference with continuous batching.

Signed-off-by: Sophie du Couédic <[email protected]>
Signed-off-by: Sophie du Couédic <[email protected]>
Signed-off-by: Sophie du Couédic <[email protected]>
Signed-off-by: Sophie du Couédic <[email protected]>
Signed-off-by: Sophie du Couédic <[email protected]>
@github-actions
Copy link

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

@sducouedic sducouedic marked this pull request as ready for review July 10, 2025 21:54
@sducouedic sducouedic requested a review from rafvasq as a code owner July 10, 2025 21:54
Signed-off-by: Sophie du Couédic <[email protected]>
@sducouedic sducouedic force-pushed the doc_cb_script_and_tests branch from 7336865 to 120cd73 Compare July 10, 2025 21:56
@sducouedic sducouedic enabled auto-merge (squash) July 11, 2025 11:14
@github-actions github-actions bot added the ready label Jul 11, 2025
Copy link
Collaborator

@rafvasq rafvasq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick note that the doc needs to be included in .nav to be shown on the site, see #226.

Also wonder if it fits better under Developer Guide and not User Guide since it seems primarily about debugging and testing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if all this can be added as docstring to the top of the respective test files themselves instead of having it sit separately (which is prone to being stale and an extra step for anyone trying to understand the code)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having one doc file encompassing the different CB testing/debugging functions is a request from the compiler team actually

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I wonder if we can add all the text from this PR into the test files and have some sort of table/visual representation encompassing the various configurations tested within the docs?

I worry that when we change something for the CB tests (which happens so frequently these days), we will forget to update the docs for the same which will lead to outdated stuff pretty fast.

Copy link
Collaborator Author

@sducouedic sducouedic Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table/visual is a good idea, and actually also a request :)
But do you have in mind an automation code that update the table/visuals in the docs given the parameters found in the tests functions? also in your mind the table/visual would be in addition to the text in the docs, or replacing the text?

Copy link
Collaborator

@prashantgupta24 prashantgupta24 Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally the visuals would be in addition to the docstring but I don't know if this can be automated, seems complex.

Copy link
Collaborator

@rafvasq rafvasq Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to jump in, I also agree that a lot of the info here probably should be just contained in the files too and this CB developer doc could point to where the script and tools are, with high-level descriptions, along with debugging notes similar to the ones here. Including that table you both mentioned.

I also noticed some "TODO" type of notes in this guide too. Those should probably be made issues for tracking rather than notes in the docs if they're important

Signed-off-by: Sophie du Couédic <[email protected]>
Signed-off-by: Sophie du Couédic <[email protected]>
sducouedic pushed a commit that referenced this pull request Jul 15, 2025
# Description

Uses `mkdocstrings` to create a doc outlining CB tests and parameters.

Related to #300

---------

Signed-off-by: Rafael Vasquez <[email protected]>
@sducouedic sducouedic force-pushed the doc_cb_script_and_tests branch from 12556b3 to aa5baec Compare July 16, 2025 11:40
@sducouedic sducouedic disabled auto-merge July 18, 2025 15:19
@prashantgupta24
Copy link
Collaborator

Do we still need this PR?

Signed-off-by: Sophie du Couédic <[email protected]>
Signed-off-by: Sophie du Couédic <[email protected]>
@sducouedic sducouedic force-pushed the doc_cb_script_and_tests branch from 23917c8 to 7c6efb3 Compare July 18, 2025 19:07
Signed-off-by: Sophie du Couédic <[email protected]>
Signed-off-by: Sophie du Couédic <[email protected]>
@sducouedic sducouedic force-pushed the doc_cb_script_and_tests branch from d93e182 to b10b8c9 Compare July 18, 2025 19:21
@sducouedic
Copy link
Collaborator Author

@prashantgupta24
I just shortened it and added links to the docstring, please check: https://vllm-spyre--300.org.readthedocs.build/en/300/contributing/continuous_batching/overview.html

@sducouedic sducouedic enabled auto-merge (squash) July 18, 2025 20:50
@yannicks1 yannicks1 self-requested a review July 21, 2025 14:41
Copy link
Collaborator

@yannicks1 yannicks1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@sducouedic sducouedic merged commit 3a7cc4b into main Jul 21, 2025
18 checks passed
@sducouedic sducouedic deleted the doc_cb_script_and_tests branch July 21, 2025 14:41
Comment on lines +1 to +3
# Continuous Batching tests / inference scripts in vLLM

Brief overview of what has been implemented so far in VLLM to test / debug continuous batching
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Continuous Batching tests / inference scripts in vLLM
Brief overview of what has been implemented so far in VLLM to test / debug continuous batching
# Continuous Batching Testing and Debugging
Overview of current tools for testing and debugging continuous batching in vLLM.

* `--max_prompt_len`: max lengths of prompts (prompts will have length up to `max_prompt_len`)
* doesn't allow to specify `--max-tokens`: number of tokens set automatically given `max_model_len` and prompts lengths

## CB tests through unit tests
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## CB tests through unit tests
## Unit Tests

Comment on lines +8 to +9
* `examples/offline_inference/cb_spyre_inference.py`
* `examples/offline_inference/long_context.py`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could be hyperlinked to the scripts in the repo

* **Purpose:** Debugging (ie. using manual execution)

### Description
* Runs inference on a set of prompts with continuous batching enabled (number of prompts is parametrizable)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Runs inference on a set of prompts with continuous batching enabled (number of prompts is parametrizable)
* Runs inference on a set of prompts with continuous batching enabled

### Description
* Runs inference on a set of prompts with continuous batching enabled (number of prompts is parametrizable)
* Prints the generated text for each sequence.
* All the requested sequences are defined in the beginning, there is no requests joining the waiting queue while the decoding of some other request has already started.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* All the requested sequences are defined in the beginning, there is no requests joining the waiting queue while the decoding of some other request has already started.
* All requested sequences are defined at the start; no new requests join the queue once decoding has started for others.

* Prints the generated text for each sequence.
* All the requested sequences are defined in the beginning, there is no requests joining the waiting queue while the decoding of some other request has already started.
* The exact sequence of prefill and decode steps depends on the parameter values `max_num_seqs`, `num-prompts`, `max-tokens`.
* If `--compare-with-CPU` is set, then the output text is compared to the one of hugging face, running on CPU. Note that here the logprobs are not compared, only tokens.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* If `--compare-with-CPU` is set, then the output text is compared to the one of hugging face, running on CPU. Note that here the logprobs are not compared, only tokens.
* If `--compare-with-CPU` is set, the output text is compared to that of Hugging Face running on CPU. Note that only the tokens are compared, logprobs are not.
* ```


See [Output Tests](tests/output_tests.md)

Output tests checks the correctness of the output of CB on a set of prompts. For now, the number of prompts and the prompts themself are hardcoded, as well as the max requested tokens per prompt (constant and set to 20). The output from vllm is compared to this of Hugging Face on CPU.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Output tests checks the correctness of the output of CB on a set of prompts. For now, the number of prompts and the prompts themself are hardcoded, as well as the max requested tokens per prompt (constant and set to 20). The output from vllm is compared to this of Hugging Face on CPU.
Output tests check the correctness of the output of CB on a set of prompts. For now, the number of prompts and the prompts themself are hardcoded, as well as the max requested tokens per prompt (constant and set to 20). The output from vLLM is compared to that of Hugging Face on CPU.

@rafvasq
Copy link
Collaborator

rafvasq commented Jul 21, 2025

Ah, I reviewed too late 😅. I can offer some of these updates myself later.

@sducouedic
Copy link
Collaborator Author

that would be great @rafvasq thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants