Skip to content

Conversation

ckadner
Copy link
Collaborator

@ckadner ckadner commented Apr 4, 2025

Changes:

  • Install Python package on GH action runner instead of building Docker image
  • Use uv instead of pip as package manager
  • Use matrix to test against different build targets
  • Use matrix strategy to run v0 and v1 tests in parallel

Resolves #41


TODO:

  • AFTER MERGE: Update branch protection rules for test-spyre as merge requirement
    • Remove test-spyre
    • Add Test/V0 (vLLM:0.8.3), Test/V1 (vLLM:0.8.3)
    • Leave off Test/V0 (vLLM:main), Test/V1 (vLLM:main) as a required check

- Install Python package on GH action runner instead of
  building Docker image
- Use uv instead of pip as package manager
- Use matrix to test against different build targets
- Use matrix strategy to run v0 and v1 tests in parallel

Resolves vllm-project#41

Signed-off-by: Christian Kadner <[email protected]>
Copy link

github-actions bot commented Apr 4, 2025

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

@ckadner
Copy link
Collaborator Author

ckadner commented Apr 4, 2025

There is a potential for another speed-up by 10 seconds, if we cache the installed entire Python site packages directory after PyTorch was installed. Cache key would depend on the PyTorch version, Python version, and OS.

@ckadner
Copy link
Collaborator Author

ckadner commented Apr 4, 2025

@joerunde do you have the powers to kick off a workflow run?

uv pip install -v -e .
uv sync --frozen --group dev
- name: "Download models"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bonus points if we could cache these, but definitely not necessary for this PR

Copy link
Collaborator Author

@ckadner ckadner Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minus points actually :-)

  • the download time from GHA cache is about equal to the download time from HF using the Python processes

    • restoring from GHA cache: 21s restore from GHA cache
    • downloading from HF: 19s download from HF
  • the two models take up about 1.8 GB of cache (10 GB limit)

GHA cache makes most sense for operations that take up a lot of compute time, not when time is spent on downloads.

Copy link
Collaborator Author

@ckadner ckadner Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can speed up the HF download times by a few seconds by running the two Python processes in "parallel":

      - name: "Download models"
        run: |
          mkdir -p "${VLLM_SPYRE_TEST_MODEL_DIR}"
          download_jackfram_llama() {
            python -c "from transformers import pipeline; pipeline('text-generation', model='JackFram/llama-160m')"
            VARIANT=$(ls "${HF_HUB_CACHE}/models--JackFram--llama-160m/snapshots/")
            ln -s "${HF_HUB_CACHE}/models--JackFram--llama-160m/snapshots/${VARIANT}" "${VLLM_SPYRE_TEST_MODEL_DIR}/llama-194m"
          }
          download_roberta_large() {
            python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('sentence-transformers/all-roberta-large-v1')"
            VARIANT=$(ls "${HF_HUB_CACHE}/models--sentence-transformers--all-roberta-large-v1/snapshots/")
            ln -s "${HF_HUB_CACHE}/models--sentence-transformers--all-roberta-large-v1/snapshots/${VARIANT}" "${VLLM_SPYRE_TEST_MODEL_DIR}/all-roberta-large-v1"
          }
          download_jackfram_llama &
          download_roberta_large &
          wait
  • in sequence: 22s image
  • in parallel: 16s image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nice!

I was thinking more along the lines of reliability rather than speed here, since the upstream vllm CI downloads a tons of models in parallel from HF and often flakes out when a download fails. But this test suite is still small enough that it's probably fine to keep pulling from HF for now. We can always switch to gha cache if it becomes a problem.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point :-)

https://github.com/vllm-project/vllm-spyre/actions/runs/14342866903/job/40206277191?pr=70#step:6:34

huggingface_hub.errors.HfHubHTTPError: 403 Forbidden: None.
Cannot access content at: https://huggingface.co/JackFram/llama-160m/resolve/main/config.json.
Make sure your token has the correct permissions.

I will do the hub cache

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh lol, that was fast!

And of course any comments about the limitations would be great so the next maintainer knows not to try to stick a 7GB model in here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joerunde -- took me a bit to get cache updates to work properly with immutable caches. I pushed another commit that should:

  • only create cache blobs for one of the matrix jobs
  • not creating cache blobs for PR branches
  • updating cache blobs on push to main when new models get added or old ones removed

@joerunde
Copy link
Collaborator

joerunde commented Apr 7, 2025

Thanks @ckadner!

I mainly just had a lot of little questions, overall this looks super great

ckadner added 3 commits April 8, 2025 13:08
- Remove extra uv settings:
  - Don't prefer PyTorch package index
  - Don't force PyPI for markupsafe package
- Download HF models in parallel

Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
@joerunde
Copy link
Collaborator

joerunde commented Apr 9, 2025

Looks like we need another main merge and to test against 0.8.3

@ckadner
Copy link
Collaborator Author

ckadner commented Apr 9, 2025

Looks like we need another main merge and to test against 0.8.3

@joerunde -- Merged. Updated to 0.8.3.

Should I make the 0.8.x matrix version dynamic, i.e. reading it from pyproject.toml?

@ckadner
Copy link
Collaborator Author

ckadner commented Apr 9, 2025

Looks like we need another main merge and to test against 0.8.3

@joerunde -- Merged. Updated to 0.8.3.

Should I make the 0.8.x matrix version dynamic, i.e. reading it from pyproject.toml?

I made a quick change to update the job matrix to test against the project default version of vLLM and vLLM:main.

A prettier version of this workflow could have a pre-test job to pull the actual version out of the pyproject.toml and then properly set the job matrix name. But using default will be better as the repository branch protection rules can not have dynamic merge requirements.

@joerunde joerunde merged commit 7e52a88 into vllm-project:main Apr 9, 2025
9 of 11 checks passed
rafvasq pushed a commit to rafvasq/vllm-spyre that referenced this pull request Apr 11, 2025
* [CI] Test against vllm/main

- Install Python package on GH action runner instead of
  building Docker image
- Use uv instead of pip as package manager
- Use matrix to test against different build targets
- Use matrix strategy to run v0 and v1 tests in parallel

Resolves vllm-project#41

Signed-off-by: Christian Kadner <[email protected]>

* review updates

- Remove extra uv settings:
  - Don't prefer PyTorch package index
  - Don't force PyPI for markupsafe package
- Download HF models in parallel

Signed-off-by: Christian Kadner <[email protected]>

* cache HF models

Signed-off-by: Christian Kadner <[email protected]>

* update to vLLM:v0.8.3

Signed-off-by: Christian Kadner <[email protected]>

* don't explicitely set vLLM 0.8.x version

Signed-off-by: Christian Kadner <[email protected]>

---------

Signed-off-by: Christian Kadner <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI] Test against vllm/main

2 participants