Skip to content

Conversation

ckadner
Copy link
Collaborator

@ckadner ckadner commented Oct 13, 2025

Description

This PR solves 2 problems:

  1. Allow to use (and cache) specific revisions for each model used in unit tests (see spyre_util.py)
  2. Separate out to combined model caches into into separate caches per model

Bonus: Use dedicated GHA cache blob per model (and revision) that can be used locally.

Options to solve the the multi-model cache blob problem:

  1. Subdivide test cases further so no group of tests uses more than one model
  2. Add separate cache actions for second model (3rd for 3rd model, ...)
    • use default model and specialized (2nd) model
  3. Use the multi-model cache blob for all other test groups that only use one of the cached models
    • wastes time to download 3 GB where only 1.2 GB or 1.7 GB is needed
  4. Use GHA cache API directly (more complicated and brittle)

Revisions used now (consistently):

from tests/spyre_util.py:

tinygranite = ModelInfo(
    name="ibm-ai-platform/micro-g3.3-8b-instruct-1b",
    revision="6e9c6465a9d7e5e9fa35004a29f0c90befa7d23f"
)
model = ModelInfo(name="sentence-transformers/all-roberta-large-v1",
                  revision="cf74d8acd4f198de950bf004b262e6accfed5d2c")
model = ModelInfo(name="cross-encoder/stsb-roberta-large",
                  revision="2b12c2c0088918e76151fd5937b7bba986ef1f98")
tinygranite_fp8 = ModelInfo(
    name="ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8",
    revision="0dff8bacb968836dbbc7c2895c6d9ead0a05dc9e"
)
# granite = ModelInfo(name="ibm-granite/granite-3.3-8b-instruct",
#                     revision="51dd4bc2ade4059a6bd87649d68aa11e4fb2529b")

Related Issues

#522
#499

Signed-off-by: Christian Kadner <[email protected]>
@ckadner
Copy link
Collaborator Author

ckadner commented Oct 13, 2025

This PR is a continuation of PR #499 rebased onto main

@ckadner ckadner mentioned this pull request Oct 13, 2025
2 tasks
Comment on lines +161 to +162
# replace '/' characters in HF_MODEL with '--' for GHA cache keys and
# in model file names in local HF hub cache
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment on how this looks like would be nice, like what was it earlier and what it looks like after replacing

Signed-off-by: Christian Kadner <[email protected]>
@@ -0,0 +1,63 @@
#!/usr/bin/env python3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this tool required if you can use huggingface-cli download? Is it to make sure that only the necessary and sufficient set of files is downloaded?

Copy link
Collaborator Author

@ckadner ckadner Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using it for the GitHub Action workflow to download models with revisions (unless they are already cached in a GHA cache blob).

#499 (comment)

I recall being told and seeing it in comments, that the HuggingFace CLI is not reliable during Github action runs, though I have never put that to the test myself.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth a shot? I also seem to remember that the HF CLI was downloading something weird at one point, but I don't see that lately. Maybe something got fixed?

Copy link
Collaborator Author

@ckadner ckadner Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall being told and seeing it in comments, that the HuggingFace CLI is not reliable during Github action runs, though I have never put that to the test myself

Worth a shot? I also seem to remember that the HF CLI was downloading something weird at one point, but I don't see that lately. Maybe something got fixed?

I tried using the hf download action and it failed in 3 of 10 test jobs:

https://github.com/ckadner/vllm-spyre/actions/runs/18510700226/job/52750674898?pr=20

 Traceback (most recent call last):
  File "/home/runner/work/vllm-spyre/vllm-spyre/.venv/bin/huggingface-cli", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/runner/work/vllm-spyre/vllm-spyre/.venv/lib/python3.12/site-packages/huggingface_hub/commands/huggingface_cli.py", line 61, in main

Copy link
Collaborator Author

@ckadner ckadner Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, System.IO.IOException: No space left on device.

[EDIT]
I wonder if HF CLI download (temporarily) creates/keeps 2 copies of the files while the download is ongoing which exceeds the available disk space on the GHA runner?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm

Copy link
Collaborator Author

@ckadner ckadner Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more time with hf download --max-workers 2 ...

Downloading 'pytorch_model.bin' to '/home/runner/work/vllm-spyre/vllm-spyre/.cache/huggingface/hub/models--cross-encoder--stsb-roberta-large/blobs/03023f7dcd714c15ff27d534432a80d3bff78c9b50778a44b10585ef5fa7fd25.incomplete'
/home/runner/work/vllm-spyre/vllm-spyre/.venv/lib/python3.12/site-packages/huggingface_hub/
file_download.py:801: UserWarning: Not enough free disk space to download the file. The expected file size is: 1421.62 MB. The target location /home/runner/work/vllm-spyre/vllm-spyre/.cache/huggingface/hub/models--cross-encoder--stsb-roberta-large/blobs only has 1414.68 MB free disk space.

Copy link
Collaborator Author

@ckadner ckadner Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably figure out how the space gets used, which files to exclude and/or how to make more space on the GHA runner.

But we already have the download script and the code in it has been running fine for several months. So, I vote for keeping the existing custom download code, albeit in a separate script now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, often repositories have extra files such as model weights in other formats. I think the script only downloads the required files, so that would explain that it doesn't run out of disk space. I would prefer less code to maintain, but since we have these space restrictions I guess for now it's better to keep the script.

Copy link
Collaborator

@maxdebayser maxdebayser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ckadner ckadner merged commit 7319431 into vllm-project:main Oct 15, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants