-
Notifications
You must be signed in to change notification settings - Fork 26
Documents a bit CB script and tests #300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Sophie du Couédic <[email protected]>
Signed-off-by: Sophie du Couédic <[email protected]>
Signed-off-by: Sophie du Couédic <[email protected]>
Signed-off-by: Sophie du Couédic <[email protected]>
Signed-off-by: Sophie du Couédic <[email protected]>
|
👋 Hi! Thank you for contributing to vLLM support on Spyre. Or this can be done with Now you are good to go 🚀 |
Signed-off-by: Sophie du Couédic <[email protected]>
7336865 to
120cd73
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick note that the doc needs to be included in .nav to be shown on the site, see #226.
Also wonder if it fits better under Developer Guide and not User Guide since it seems primarily about debugging and testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if all this can be added as docstring to the top of the respective test files themselves instead of having it sit separately (which is prone to being stale and an extra step for anyone trying to understand the code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having one doc file encompassing the different CB testing/debugging functions is a request from the compiler team actually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I wonder if we can add all the text from this PR into the test files and have some sort of table/visual representation encompassing the various configurations tested within the docs?
I worry that when we change something for the CB tests (which happens so frequently these days), we will forget to update the docs for the same which will lead to outdated stuff pretty fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The table/visual is a good idea, and actually also a request :)
But do you have in mind an automation code that update the table/visuals in the docs given the parameters found in the tests functions? also in your mind the table/visual would be in addition to the text in the docs, or replacing the text?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally the visuals would be in addition to the docstring but I don't know if this can be automated, seems complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to jump in, I also agree that a lot of the info here probably should be just contained in the files too and this CB developer doc could point to where the script and tools are, with high-level descriptions, along with debugging notes similar to the ones here. Including that table you both mentioned.
I also noticed some "TODO" type of notes in this guide too. Those should probably be made issues for tracking rather than notes in the docs if they're important
Signed-off-by: Sophie du Couédic <[email protected]>
Signed-off-by: Sophie du Couédic <[email protected]>
# Description Uses `mkdocstrings` to create a doc outlining CB tests and parameters. Related to #300 --------- Signed-off-by: Rafael Vasquez <[email protected]>
Signed-off-by: Sophie du Couédic <[email protected]>
Signed-off-by: Sophie du Couédic <[email protected]>
Signed-off-by: Sophie du Couédic <[email protected]>
Signed-off-by: Sophie du Couédic <[email protected]>
Signed-off-by: Sophie du Couédic <[email protected]>
12556b3 to
aa5baec
Compare
Signed-off-by: Sophie du Couédic <[email protected]>
Signed-off-by: Sophie du Couédic <[email protected]>
Signed-off-by: Sophie du Couédic <[email protected]>
Signed-off-by: Sophie du Couédic <[email protected]>
|
Do we still need this PR? |
Signed-off-by: Sophie du Couédic <[email protected]>
Signed-off-by: Sophie du Couédic <[email protected]>
23917c8 to
7c6efb3
Compare
Signed-off-by: Sophie du Couédic <[email protected]>
Signed-off-by: Sophie du Couédic <[email protected]>
d93e182 to
b10b8c9
Compare
|
@prashantgupta24 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
| # Continuous Batching tests / inference scripts in vLLM | ||
|
|
||
| Brief overview of what has been implemented so far in VLLM to test / debug continuous batching |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Continuous Batching tests / inference scripts in vLLM | |
| Brief overview of what has been implemented so far in VLLM to test / debug continuous batching | |
| # Continuous Batching Testing and Debugging | |
| Overview of current tools for testing and debugging continuous batching in vLLM. |
| * `--max_prompt_len`: max lengths of prompts (prompts will have length up to `max_prompt_len`) | ||
| * doesn't allow to specify `--max-tokens`: number of tokens set automatically given `max_model_len` and prompts lengths | ||
|
|
||
| ## CB tests through unit tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ## CB tests through unit tests | |
| ## Unit Tests |
| * `examples/offline_inference/cb_spyre_inference.py` | ||
| * `examples/offline_inference/long_context.py` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These could be hyperlinked to the scripts in the repo
| * **Purpose:** Debugging (ie. using manual execution) | ||
|
|
||
| ### Description | ||
| * Runs inference on a set of prompts with continuous batching enabled (number of prompts is parametrizable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * Runs inference on a set of prompts with continuous batching enabled (number of prompts is parametrizable) | |
| * Runs inference on a set of prompts with continuous batching enabled |
| ### Description | ||
| * Runs inference on a set of prompts with continuous batching enabled (number of prompts is parametrizable) | ||
| * Prints the generated text for each sequence. | ||
| * All the requested sequences are defined in the beginning, there is no requests joining the waiting queue while the decoding of some other request has already started. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * All the requested sequences are defined in the beginning, there is no requests joining the waiting queue while the decoding of some other request has already started. | |
| * All requested sequences are defined at the start; no new requests join the queue once decoding has started for others. |
| * Prints the generated text for each sequence. | ||
| * All the requested sequences are defined in the beginning, there is no requests joining the waiting queue while the decoding of some other request has already started. | ||
| * The exact sequence of prefill and decode steps depends on the parameter values `max_num_seqs`, `num-prompts`, `max-tokens`. | ||
| * If `--compare-with-CPU` is set, then the output text is compared to the one of hugging face, running on CPU. Note that here the logprobs are not compared, only tokens. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * If `--compare-with-CPU` is set, then the output text is compared to the one of hugging face, running on CPU. Note that here the logprobs are not compared, only tokens. | |
| * If `--compare-with-CPU` is set, the output text is compared to that of Hugging Face running on CPU. Note that only the tokens are compared, logprobs are not. | |
| * ``` |
|
|
||
| See [Output Tests](tests/output_tests.md) | ||
|
|
||
| Output tests checks the correctness of the output of CB on a set of prompts. For now, the number of prompts and the prompts themself are hardcoded, as well as the max requested tokens per prompt (constant and set to 20). The output from vllm is compared to this of Hugging Face on CPU. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Output tests checks the correctness of the output of CB on a set of prompts. For now, the number of prompts and the prompts themself are hardcoded, as well as the max requested tokens per prompt (constant and set to 20). The output from vllm is compared to this of Hugging Face on CPU. | |
| Output tests check the correctness of the output of CB on a set of prompts. For now, the number of prompts and the prompts themself are hardcoded, as well as the max requested tokens per prompt (constant and set to 20). The output from vLLM is compared to that of Hugging Face on CPU. |
|
Ah, I reviewed too late 😅. I can offer some of these updates myself later. |
|
that would be great @rafvasq thanks |
This PR introduces a brief overview on how to debug and test the continuous batching functionality in vLLM. It pinpoints the main testing functions and script for inference with continuous batching.