[CI] Enable model revisions in GHA test #523

ckadner · 2025-10-13T19:44:00Z

Description

This PR solves 2 problems:

Allow to use (and cache) specific revisions for each model used in unit tests (see spyre_util.py)
Separate out to combined model caches into into separate caches per model

Bonus: Use dedicated GHA cache blob per model (and revision) that can be used locally.

Options to solve the the multi-model cache blob problem:

Subdivide test cases further so no group of tests uses more than one model
Add separate cache actions for second model (3rd for 3rd model, ...)
- use default model and specialized (2nd) model
Use the multi-model cache blob for all other test groups that only use one of the cached models
- wastes time to download 3 GB where only 1.2 GB or 1.7 GB is needed
Use GHA cache API directly (more complicated and brittle)

Revisions used now (consistently):

from tests/spyre_util.py:

tinygranite = ModelInfo(
    name="ibm-ai-platform/micro-g3.3-8b-instruct-1b",
    revision="6e9c6465a9d7e5e9fa35004a29f0c90befa7d23f"
)

model = ModelInfo(name="sentence-transformers/all-roberta-large-v1",
                  revision="cf74d8acd4f198de950bf004b262e6accfed5d2c")

model = ModelInfo(name="cross-encoder/stsb-roberta-large",
                  revision="2b12c2c0088918e76151fd5937b7bba986ef1f98")

tinygranite_fp8 = ModelInfo(
    name="ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8",
    revision="0dff8bacb968836dbbc7c2895c6d9ead0a05dc9e"
)

# granite = ModelInfo(name="ibm-granite/granite-3.3-8b-instruct",
#                     revision="51dd4bc2ade4059a6bd87649d68aa11e4fb2529b")

Related Issues

#522
#499

Signed-off-by: Christian Kadner <[email protected]>

ckadner · 2025-10-13T19:45:25Z

This PR is a continuation of PR #499 rebased onto main

Signed-off-by: Christian Kadner <[email protected]>

.github/workflows/test.yml

Signed-off-by: Christian Kadner <[email protected]>

prashantgupta24 · 2025-10-13T23:05:17Z

.github/workflows/test.yml

+          # replace '/' characters in HF_MODEL with '--' for GHA cache keys and
+          # in model file names in local HF hub cache


A comment on how this looks like would be nice, like what was it earlier and what it looks like after replacing

Signed-off-by: Christian Kadner <[email protected]>

maxdebayser · 2025-10-14T02:37:17Z

tools/download_model.py

@@ -0,0 +1,63 @@
+#!/usr/bin/env python3


Why is this tool required if you can use huggingface-cli download? Is it to make sure that only the necessary and sufficient set of files is downloaded?

We are using it for the GitHub Action workflow to download models with revisions (unless they are already cached in a GHA cache blob).

#499 (comment)

I recall being told and seeing it in comments, that the HuggingFace CLI is not reliable during Github action runs, though I have never put that to the test myself.

Worth a shot? I also seem to remember that the HF CLI was downloading something weird at one point, but I don't see that lately. Maybe something got fixed?

I recall being told and seeing it in comments, that the HuggingFace CLI is not reliable during Github action runs, though I have never put that to the test myself

Worth a shot? I also seem to remember that the HF CLI was downloading something weird at one point, but I don't see that lately. Maybe something got fixed?

I tried using the hf download action and it failed in 3 of 10 test jobs:

https://github.com/ckadner/vllm-spyre/actions/runs/18510700226/job/52750674898?pr=20

Traceback (most recent call last): File "/home/runner/work/vllm-spyre/vllm-spyre/.venv/bin/huggingface-cli", line 10, in <module> sys.exit(main()) ^^^^^^ File "/home/runner/work/vllm-spyre/vllm-spyre/.venv/lib/python3.12/site-packages/huggingface_hub/commands/huggingface_cli.py", line 61, in main

Oh, System.IO.IOException: No space left on device.

[EDIT]
I wonder if HF CLI download (temporarily) creates/keeps 2 copies of the files while the download is ongoing which exceeds the available disk space on the GHA runner?

Yup. Just reverted. All downloads went through fine: https://github.com/ckadner/vllm-spyre/actions/runs/18510991794/job/52751608834?pr=20

One more time with hf download --max-workers 2 ...

Downloading 'pytorch_model.bin' to '/home/runner/work/vllm-spyre/vllm-spyre/.cache/huggingface/hub/models--cross-encoder--stsb-roberta-large/blobs/03023f7dcd714c15ff27d534432a80d3bff78c9b50778a44b10585ef5fa7fd25.incomplete' /home/runner/work/vllm-spyre/vllm-spyre/.venv/lib/python3.12/site-packages/huggingface_hub/ file_download.py:801: UserWarning: Not enough free disk space to download the file. The expected file size is: 1421.62 MB. The target location /home/runner/work/vllm-spyre/vllm-spyre/.cache/huggingface/hub/models--cross-encoder--stsb-roberta-large/blobs only has 1414.68 MB free disk space.

We could probably figure out how the space gets used, which files to exclude and/or how to make more space on the GHA runner.

But we already have the download script and the code in it has been running fine for several months. So, I vote for keeping the existing custom download code, albeit in a separate script now.

Yes, often repositories have extra files such as model weights in other formats. I think the script only downloads the required files, so that would explain that it doesn't run out of disk space. I would prefer less code to maintain, but since we have these space restrictions I guess for now it's better to keep the script.

Signed-off-by: Prashant Gupta <[email protected]> Signed-off-by: Christian Kadner <[email protected]>

maxdebayser

LGTM

GHA test with model revisions

e8794ed

Signed-off-by: Christian Kadner <[email protected]>

ckadner requested review from joerunde and prashantgupta24 as code owners October 13, 2025 19:44

vllm-project deleted a comment from github-actions bot Oct 13, 2025

ckadner added 2 commits October 13, 2025 12:48

don't specify revision for DEFAULT_HF_MODEL

d8e6a87

Signed-off-by: Christian Kadner <[email protected]>

formatting and help comments

492a627

Signed-off-by: Christian Kadner <[email protected]>

ckadner mentioned this pull request Oct 13, 2025

WIP: GHA test with model revisions #499

Closed

2 tasks

prashantgupta24 reviewed Oct 13, 2025

View reviewed changes

.github/workflows/test.yml Outdated Show resolved Hide resolved

prashantgupta24 reviewed Oct 13, 2025

View reviewed changes

.github/workflows/test.yml Show resolved Hide resolved

ckadner added 2 commits October 13, 2025 15:06

Merge branch 'main' into enable_model_revisions_in_GHA_tests

148741f

cache same model revisions as specified in unit tests

ca5ba09

Signed-off-by: Christian Kadner <[email protected]>

prashantgupta24 reviewed Oct 13, 2025

View reviewed changes

set model revision in test prompt logprobs

b0aa69c

Signed-off-by: Christian Kadner <[email protected]>

ckadner requested review from rafvasq and sducouedic as code owners October 14, 2025 00:23

YAPF!

7c9d12f

Signed-off-by: Christian Kadner <[email protected]>

maxdebayser reviewed Oct 14, 2025

View reviewed changes

prashantgupta24 and others added 2 commits October 14, 2025 16:54

get_tokenizer with revision for GTI

89c4f46

Signed-off-by: Prashant Gupta <[email protected]> Signed-off-by: Christian Kadner <[email protected]>

Merge branch 'main' into enable_model_revisions_in_GHA_tests

91f2c11

maxdebayser approved these changes Oct 15, 2025

View reviewed changes

ckadner merged commit 7319431 into vllm-project:main Oct 15, 2025
18 checks passed

		# replace '/' characters in HF_MODEL with '--' for GHA cache keys and
		# in model file names in local HF hub cache

[CI] Enable model revisions in GHA test #523

[CI] Enable model revisions in GHA test #523

Conversation

ckadner commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Options to solve the the multi-model cache blob problem:

Revisions used now (consistently):

Related Issues

Uh oh!

ckadner commented Oct 13, 2025

Uh oh!

Uh oh!

Uh oh!

prashantgupta24 Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

maxdebayser Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

ckadner Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

prashantgupta24 Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

ckadner Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ckadner Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ckadner Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

prashantgupta24 Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

ckadner Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ckadner Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maxdebayser Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

maxdebayser left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ckadner commented Oct 13, 2025 •

edited

Loading

ckadner Oct 14, 2025 •

edited

Loading

ckadner Oct 14, 2025 •

edited

Loading

ckadner Oct 14, 2025 •

edited

Loading

ckadner Oct 14, 2025 •

edited

Loading

ckadner Oct 14, 2025 •

edited

Loading