Skip to content

Conversation

joerunde
Copy link
Collaborator

@joerunde joerunde commented Oct 2, 2025

Fix granite-3.3-8b detection

Fixes #498

This PR updates the logic that checks for granite-3.3-8b-instruct. Instead of matching on the model name, it checks the hf config type and checks for specific values related to the 8b granite variant.

This also consolidates all of the granite-specific overrides to one method inside platform.py. This should make it much easier to both:

  • Update these overrides for granite 3.3 8b
  • Add overrides for other models

in the future.

The number of blocks override for the spyre kv cache size was particularly interesting. We were in a halfway state where we took overriddes from --num-gpu-blocks-override and saved them off to platform.py and then used the override to change the scheduler's behavior. I've updated things so that the scheduler no longer needs that extra override so that we can simply use the configured override everywhere. This is more robust since the config value is serialized and given to subprocesses, and more maintainable than managing two separate numbers for this purpose.

Copy link

github-actions bot commented Oct 2, 2025

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

Signed-off-by: Joe Runde <[email protected]>
# TODO: replace the hard coded NUM_BLOCKS_SPYRE by calling a function
# in torch_sendnn which returns the value set by the Spyre compiler.
if ('granite-3.3-8b-instruct' in self.model_config.model
if (SpyrePlatform.is_granite_33_8b(self.model_config)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we have to account for other granite models down the road

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very likely, given current timeframes for model support

@joerunde
Copy link
Collaborator Author

joerunde commented Oct 3, 2025

bot:test
MARKERS="spyre and cb and not quantized"

@joerunde
Copy link
Collaborator Author

joerunde commented Oct 3, 2025

bot:test
MARKERS="spyre and cb and not quantized"

@joerunde joerunde requested a review from rafvasq as a code owner October 3, 2025 22:10
joerunde and others added 4 commits October 3, 2025 16:45
Signed-off-by: Joe Runde <[email protected]>
Co-authored-by: Travis Johnson <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Co-authored-by: Travis Johnson <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
@joerunde
Copy link
Collaborator Author

joerunde commented Oct 3, 2025

bot:test
MARKERS="spyre and cb and not quantized"

@joerunde
Copy link
Collaborator Author

joerunde commented Oct 6, 2025

@gkumbhat @tjohnson31415 I verified this with:

export VLLM_SPYRE_USE_CB=1

# Boot up with granite 33 8b
vllm serve ibm-granite/granite-3.3-8b-instruct --max-model-len 256 --max-num-seqs 4 --tens 4

# Symlink model to somewhere else
ln -s /models/huggingface_cache/hub/models--ibm-granite--granite-3.3-8b-instruct/snapshots/51dd4bc2ade4059a6bd87649d68aa11e4fb2529b/ /tmp/granite

# Disable compilation and boot again from the symlink
DISABLE_COMPILATION=1 vllm serve /tmp/granite --max-model-len 256 --max-num-seqs 4 --tens 4

Seems to work! Logs are as expected as well, showing

Spyre KV cache size: 133120 tokens

which is 2080 * 64

Copy link
Collaborator

@gkumbhat gkumbhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@joerunde joerunde merged commit 2e8f733 into main Oct 6, 2025
20 checks passed
@joerunde joerunde deleted the g3.3-detection branch October 6, 2025 17:54
@ckadner ckadner mentioned this pull request Oct 8, 2025
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Name mismatch for granite model name

4 participants