⚡ cache hf results in tests #373

joerunde · 2025-08-12T20:23:38Z

Description

In an effort to reduce test runtime, this uses functools.lru_cache to cache text generation results from hf transformers. This should shave about 15% off the cpu runtime based on some quick measurements on an M3.

NB: This probably will not work when running with --forked, but we don't fork the tests on github actions runs.

Signed-off-by: Joe Runde <[email protected]>

github-actions · 2025-08-12T20:25:19Z

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

jberkhahn · 2025-08-12T20:42:29Z

Looks good to me! I'm seeing the same intermittent test failures on my PR though, not sure what they're from?

maxdebayser

nice!

joerunde · 2025-08-12T20:57:49Z

@jberkhahn the failures on all vLLM:main jobs are expected, they're just a signal that something has changed in vllm that we need to address. Catching those early like this helps us be ready when a new vllm version is released.

It does look like I missed a list -> tuple conversion in the continuous batching tests though :(

ckadner · 2025-08-12T21:01:13Z

tests/spyre_util.py

+# This uses lru_cache to cache the generated text so that we don't have to
+# always load and run the transformers model, nor manage a set of files of
+# expected results.
+@functools.lru_cache


Should we limit the cache size?

https://docs.python.org/3/library/functools.html#functools.lru_cache

@functools.lru_cache(maxsize=128)

Though, I don't suppose there a reason to be concerned about a growing cache?

Yeah I wasn't too concerned because this should be caching relatively small objects

joerunde · 2025-08-12T21:08:06Z

🤔 This also doesn't appear to be reducing the runtime on github actions at all, which is a bit odd. Something seems off

tests/spyre_util.py

Signed-off-by: Joe Runde <[email protected]>

prashantgupta24 · 2025-08-12T22:05:02Z

🤔 This also doesn't appear to be reducing the runtime on github actions at all, which is a bit odd. Something seems off

I was hoping the unhashable error was the reason the caching wasn't working

joerunde · 2025-08-12T22:11:08Z

Ah, so the reduction in time here is actually much better when static batching and continuous batching tests are run together, since they use the same prompts and would share cache for the expected results. Separated, there are far fewer cache hits :(

Also the tests like the tkv scheduler tests that generate specific-length prompts on the fly don't benefit either. So the speedup here isn't great, but could work better if we can manage to do things like:

Run sb and cb tests together
Standardize the prompts we run more across tests
Pre-fetch prompts to be used in the suite and run them through hf up-front

Signed-off-by: Joe Runde <[email protected]>

joerunde · 2025-08-13T20:14:30Z

Okay- swapping this over to use a file-based cache which should avoid loading the model w/ transformers at all in these test runs

Signed-off-by: Joe Runde <[email protected]>

maxdebayser · 2025-08-14T12:47:00Z

tests/hf_result_cache.py

+                json.dump(self.cached_results, f)
+            self.dirty = False
+
+    def get_cached_result(self, model: str, prompt: str,


Shouldn't the type annotation for prompt be Union[str, list[int]]?

maxdebayser · 2025-08-14T12:48:14Z

tests/hf_result_cache.py

+        return self.cached_results.get(model, {}).get(prompt,
+                                                      {}).get(max_tokens, {})
+
+    def add_to_cache(self, model: str, prompt: str, max_tokens: int,


Same comment about the prompt type annotation

maxdebayser · 2025-08-14T12:49:23Z

tests/hf_result_cache.py

+        """Use a string to represent a list of token ids, so that it can be
+        hashed and used as a json key."""
+
+        return "__tokens__" + "_".join(str(token_id) for token_id in token_ids)


nice, no tokenizer required.

maxdebayser

LGTM. The failing main tests are due to a new upstream change fixed in #380

Signed-off-by: Joe Runde <[email protected]>

⚡ cache hf results in tests

eaf62d1

Signed-off-by: Joe Runde <[email protected]>

joerunde requested review from prashantgupta24, rafvasq and sducouedic as code owners August 12, 2025 20:23

maxdebayser approved these changes Aug 12, 2025

View reviewed changes

ckadner reviewed Aug 12, 2025

View reviewed changes

prashantgupta24 reviewed Aug 12, 2025

View reviewed changes

tests/spyre_util.py Outdated Show resolved Hide resolved

🐛 fix token prompts

af9b454

Signed-off-by: Joe Runde <[email protected]>

joerunde added 2 commits August 13, 2025 14:08

✨ swap to persistent file-based caching

a7ec8d8

Signed-off-by: Joe Runde <[email protected]>

📝 docstring for hf cache

798f157

Signed-off-by: Joe Runde <[email protected]>

🐛 fix caching

403bac7

Signed-off-by: Joe Runde <[email protected]>

maxdebayser reviewed Aug 14, 2025

View reviewed changes

maxdebayser approved these changes Aug 14, 2025

View reviewed changes

joerunde added 3 commits August 14, 2025 11:25

🎨 fixup type hint

36bf204

Signed-off-by: Joe Runde <[email protected]>

Merge branch 'main' into cache-hf-results

05be1a6

🐛 fix merge error, add cache

5916fb2

Signed-off-by: Joe Runde <[email protected]>

joerunde enabled auto-merge (squash) August 14, 2025 19:25

github-actions bot added the ready label Aug 14, 2025

joerunde merged commit e6c0d33 into main Aug 14, 2025
23 checks passed

joerunde deleted the cache-hf-results branch August 14, 2025 19:32

⚡ cache hf results in tests #373

⚡ cache hf results in tests #373

Uh oh!

Conversation

joerunde commented Aug 12, 2025

Description

Uh oh!

github-actions bot commented Aug 12, 2025

Uh oh!

jberkhahn commented Aug 12, 2025

Uh oh!

maxdebayser left a comment

Choose a reason for hiding this comment

Uh oh!

joerunde commented Aug 12, 2025

Uh oh!

ckadner Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

joerunde Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

joerunde commented Aug 12, 2025

Uh oh!

Uh oh!

prashantgupta24 commented Aug 12, 2025

Uh oh!

joerunde commented Aug 12, 2025

Uh oh!

joerunde commented Aug 13, 2025

Uh oh!

maxdebayser Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

maxdebayser Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

maxdebayser Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

maxdebayser left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants