-
Notifications
You must be signed in to change notification settings - Fork 26
[CI] Test against vllm/main #70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Install Python package on GH action runner instead of building Docker image - Use uv instead of pip as package manager - Use matrix to test against different build targets - Use matrix strategy to run v0 and v1 tests in parallel Resolves vllm-project#41 Signed-off-by: Christian Kadner <[email protected]>
👋 Hi! Thank you for contributing to vLLM support on Spyre.
Or this can be done with
Now you are good to go 🚀 |
There is a potential for another speed-up by 10 seconds, if we cache the installed entire Python site packages directory after PyTorch was installed. Cache key would depend on the PyTorch version, Python version, and OS. |
@joerunde do you have the powers to kick off a workflow run? |
.github/workflows/test-spyre.yml
Outdated
uv pip install -v -e . | ||
uv sync --frozen --group dev | ||
- name: "Download models" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bonus points if we could cache these, but definitely not necessary for this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minus points actually :-)
-
the download time from GHA cache is about equal to the download time from HF using the Python processes
-
the two models take up about 1.8 GB of cache (10 GB limit)
GHA cache makes most sense for operations that take up a lot of compute time, not when time is spent on downloads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can speed up the HF download times by a few seconds by running the two Python processes in "parallel":
- name: "Download models"
run: |
mkdir -p "${VLLM_SPYRE_TEST_MODEL_DIR}"
download_jackfram_llama() {
python -c "from transformers import pipeline; pipeline('text-generation', model='JackFram/llama-160m')"
VARIANT=$(ls "${HF_HUB_CACHE}/models--JackFram--llama-160m/snapshots/")
ln -s "${HF_HUB_CACHE}/models--JackFram--llama-160m/snapshots/${VARIANT}" "${VLLM_SPYRE_TEST_MODEL_DIR}/llama-194m"
}
download_roberta_large() {
python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('sentence-transformers/all-roberta-large-v1')"
VARIANT=$(ls "${HF_HUB_CACHE}/models--sentence-transformers--all-roberta-large-v1/snapshots/")
ln -s "${HF_HUB_CACHE}/models--sentence-transformers--all-roberta-large-v1/snapshots/${VARIANT}" "${VLLM_SPYRE_TEST_MODEL_DIR}/all-roberta-large-v1"
}
download_jackfram_llama &
download_roberta_large &
wait
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, nice!
I was thinking more along the lines of reliability rather than speed here, since the upstream vllm CI downloads a tons of models in parallel from HF and often flakes out when a download fails. But this test suite is still small enough that it's probably fine to keep pulling from HF for now. We can always switch to gha cache if it becomes a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point :-)
https://github.com/vllm-project/vllm-spyre/actions/runs/14342866903/job/40206277191?pr=70#step:6:34
huggingface_hub.errors.HfHubHTTPError: 403 Forbidden: None.
Cannot access content at: https://huggingface.co/JackFram/llama-160m/resolve/main/config.json.
Make sure your token has the correct permissions.
I will do the hub cache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh lol, that was fast!
And of course any comments about the limitations would be great so the next maintainer knows not to try to stick a 7GB model in here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joerunde -- took me a bit to get cache updates to work properly with immutable caches. I pushed another commit that should:
- only create cache blobs for one of the matrix jobs
- not creating cache blobs for PR branches
- updating cache blobs on push to main when new models get added or old ones removed
Thanks @ckadner! I mainly just had a lot of little questions, overall this looks super great |
- Remove extra uv settings: - Don't prefer PyTorch package index - Don't force PyPI for markupsafe package - Download HF models in parallel Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Looks like we need another main merge and to test against 0.8.3 |
Signed-off-by: Christian Kadner <[email protected]>
@joerunde -- Merged. Updated to 0.8.3. Should I make the |
Signed-off-by: Christian Kadner <[email protected]>
I made a quick change to update the job matrix to test against the project A prettier version of this workflow could have a pre-test job to pull the actual version out of the |
* [CI] Test against vllm/main - Install Python package on GH action runner instead of building Docker image - Use uv instead of pip as package manager - Use matrix to test against different build targets - Use matrix strategy to run v0 and v1 tests in parallel Resolves vllm-project#41 Signed-off-by: Christian Kadner <[email protected]> * review updates - Remove extra uv settings: - Don't prefer PyTorch package index - Don't force PyPI for markupsafe package - Download HF models in parallel Signed-off-by: Christian Kadner <[email protected]> * cache HF models Signed-off-by: Christian Kadner <[email protected]> * update to vLLM:v0.8.3 Signed-off-by: Christian Kadner <[email protected]> * don't explicitely set vLLM 0.8.x version Signed-off-by: Christian Kadner <[email protected]> --------- Signed-off-by: Christian Kadner <[email protected]>
Changes:
Resolves #41
TODO:
test-spyre
as merge requirementtest-spyre
Test/V0 (vLLM:0.8.3)
,Test/V1 (vLLM:0.8.3)
Test/V0 (vLLM:main)
,Test/V1 (vLLM:main)
as a required check