🐛 implement better checking for granite #500

joerunde · 2025-10-02T17:22:37Z

Fix granite-3.3-8b detection

Fixes #498

This PR updates the logic that checks for granite-3.3-8b-instruct. Instead of matching on the model name, it checks the hf config type and checks for specific values related to the 8b granite variant.

This also consolidates all of the granite-specific overrides to one method inside platform.py. This should make it much easier to both:

Update these overrides for granite 3.3 8b
Add overrides for other models

in the future.

The number of blocks override for the spyre kv cache size was particularly interesting. We were in a halfway state where we took overriddes from --num-gpu-blocks-override and saved them off to platform.py and then used the override to change the scheduler's behavior. I've updated things so that the scheduler no longer needs that extra override so that we can simply use the configured override everywhere. This is more robust since the config value is serialized and given to subprocesses, and more maintainable than managing two separate numbers for this purpose.

Signed-off-by: Joe Runde <[email protected]>

github-actions · 2025-10-02T17:22:47Z

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

Signed-off-by: Joe Runde <[email protected]>

vllm_spyre/platform.py

prashantgupta24 · 2025-10-02T17:40:27Z

vllm_spyre/model_executor/model_loader/spyre.py

        # TODO: replace the hard coded NUM_BLOCKS_SPYRE by calling a function
        # in torch_sendnn which returns the value set by the Spyre compiler.
-        if ('granite-3.3-8b-instruct' in self.model_config.model
+        if (SpyrePlatform.is_granite_33_8b(self.model_config)


I wonder if we have to account for other granite models down the road

Very likely, given current timeframes for model support

vllm_spyre/platform.py

Signed-off-by: Joe Runde <[email protected]>

joerunde · 2025-10-03T14:52:36Z

bot:test
MARKERS="spyre and cb and not quantized"

joerunde · 2025-10-03T15:35:32Z

bot:test
MARKERS="spyre and cb and not quantized"

Signed-off-by: Joe Runde <[email protected]>

vllm_spyre/model_executor/model_loader/spyre.py

vllm_spyre/platform.py

Signed-off-by: Joe Runde <[email protected]>

Co-authored-by: Travis Johnson <[email protected]> Signed-off-by: Joe Runde <[email protected]>

Signed-off-by: Joe Runde <[email protected]>

joerunde · 2025-10-03T23:22:21Z

bot:test
MARKERS="spyre and cb and not quantized"

joerunde · 2025-10-06T17:23:29Z

@gkumbhat @tjohnson31415 I verified this with:

export VLLM_SPYRE_USE_CB=1

# Boot up with granite 33 8b
vllm serve ibm-granite/granite-3.3-8b-instruct --max-model-len 256 --max-num-seqs 4 --tens 4

# Symlink model to somewhere else
ln -s /models/huggingface_cache/hub/models--ibm-granite--granite-3.3-8b-instruct/snapshots/51dd4bc2ade4059a6bd87649d68aa11e4fb2529b/ /tmp/granite

# Disable compilation and boot again from the symlink
DISABLE_COMPILATION=1 vllm serve /tmp/granite --max-model-len 256 --max-num-seqs 4 --tens 4

Seems to work! Logs are as expected as well, showing

Spyre KV cache size: 133120 tokens

which is 2080 * 64

gkumbhat

Looks good to me!

joerunde added 2 commits October 2, 2025 11:16

🐛 implement better checking for granite

ab3cda7

Signed-off-by: Joe Runde <[email protected]>

♻️ cleanup constants

0c3c937

Signed-off-by: Joe Runde <[email protected]>

joerunde requested review from nikolaospapandreou, sducouedic, tdoublep and yannicks1 as code owners October 2, 2025 17:22

🐛 fixup call

a05befe

Signed-off-by: Joe Runde <[email protected]>

prashantgupta24 reviewed Oct 2, 2025

View reviewed changes

vllm_spyre/platform.py Outdated Show resolved Hide resolved

prashantgupta24 reviewed Oct 2, 2025

View reviewed changes

tjohnson31415 reviewed Oct 2, 2025

View reviewed changes

vllm_spyre/platform.py Outdated Show resolved Hide resolved

vllm_spyre/platform.py Outdated Show resolved Hide resolved

vllm_spyre/platform.py Outdated Show resolved Hide resolved

joerunde added 2 commits October 2, 2025 15:33

♻️ consolidate block overrides for granite

e83893c

Signed-off-by: Joe Runde <[email protected]>

🐛 fixup vllm kv cache

298f5bc

Signed-off-by: Joe Runde <[email protected]>

Merge branch 'main' into g3.3-detection

2c72a99

🐛 add config tests and fix 8b bug

1ef6dcd

Signed-off-by: Joe Runde <[email protected]>

joerunde requested a review from rafvasq as a code owner October 3, 2025 22:10

tjohnson31415 reviewed Oct 3, 2025

View reviewed changes

joerunde and others added 4 commits October 3, 2025 16:45

🧪 check for env var overrides too

c8dbfe7

Signed-off-by: Joe Runde <[email protected]>

🐛 set less swap space

9dae41b

Signed-off-by: Joe Runde <[email protected]>

Update vllm_spyre/platform.py

51c3103

Co-authored-by: Travis Johnson <[email protected]> Signed-off-by: Joe Runde <[email protected]>

Update vllm_spyre/platform.py

09735e8

Co-authored-by: Travis Johnson <[email protected]> Signed-off-by: Joe Runde <[email protected]>

joerunde force-pushed the g3.3-detection branch from 13c0877 to 09735e8 Compare October 3, 2025 23:16

joerunde added 3 commits October 3, 2025 17:17

♻️ remove default None from getenv

87b875d

Signed-off-by: Joe Runde <[email protected]>

♻️ add elif checks for warnings

8991f8d

Signed-off-by: Joe Runde <[email protected]>

♻️ remove unused model.vllm_config

9d9c874

Signed-off-by: Joe Runde <[email protected]>

gkumbhat approved these changes Oct 6, 2025

View reviewed changes

joerunde merged commit 2e8f733 into main Oct 6, 2025
20 checks passed

joerunde deleted the g3.3-detection branch October 6, 2025 17:54

ckadner mentioned this pull request Oct 8, 2025

Manage supported model configurations #445

Merged

13 tasks

🐛 implement better checking for granite #500

🐛 implement better checking for granite #500

Uh oh!

Conversation

joerunde commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix granite-3.3-8b detection

Uh oh!

github-actions bot commented Oct 2, 2025

Uh oh!

Uh oh!

prashantgupta24 Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

joerunde Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

joerunde commented Oct 3, 2025

Uh oh!

joerunde commented Oct 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

joerunde commented Oct 3, 2025

Uh oh!

joerunde commented Oct 6, 2025

Uh oh!

gkumbhat left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

joerunde commented Oct 2, 2025 •

edited

Loading