Skip to content

Commit 3e35a3a

Browse files
authored
🐛 Fix fp8 model name check with quantization check (#535)
# Description - Replace FP8 model name check with quantization check since user could choose to change their model name and then things can go into weird state, and if they are trying to use pre-compiled model, then it will go into re-compilation ## Related Issues <!-- Link related issues e.g. `Fixes #<issue>` --> Signed-off-by: Gaurav-Kumbhat <[email protected]>
1 parent de0e3d2 commit 3e35a3a

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

vllm_spyre/v1/worker/spyre_worker.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -444,7 +444,8 @@ def _warmup_spyre_dynamic_size(self, special_token_ids):
444444
0, len(valid_token_ids_tensor), (3, prompt_len))]
445445

446446
# TODO: we need 2 requests for warmup on FP8+CB
447-
is_fp8_plus_cb = 'FP8' in self.model_config.model and \
447+
# Check if model is quantized
448+
is_fp8_plus_cb = self.model_config.quantization is not None and \
448449
envs_spyre.VLLM_SPYRE_USE_CB
449450
req_count = 3 if is_fp8_plus_cb else 2
450451
requests = [

0 commit comments

Comments
 (0)