🐛 Fix fp8 model name check with quantization check (#535)

gkumbhat · web-flow · commit 3e35a3ab6a89 · 2025-10-17T09:23:36.000-06:00
# Description

- Replace FP8 model name check with quantization check since user could
choose to change their model name and then things can go into weird
state, and if they are trying to use pre-compiled model, then it will go
into re-compilation

## Related Issues

&lt;!-- Link related issues e.g. `Fixes #&lt;issue&gt;` --&gt;

Signed-off-by: Gaurav-Kumbhat &lt;Gaurav.Kumbhat@ibm.com&gt;
diff --git a/vllm_spyre/v1/worker/spyre_worker.py b/vllm_spyre/v1/worker/spyre_worker.py
@@ -444,7 +444,8 @@ def _warmup_spyre_dynamic_size(self, special_token_ids):
             0, len(valid_token_ids_tensor), (3, prompt_len))]
 
         # TODO: we need 2 requests for warmup on FP8+CB
-        is_fp8_plus_cb = 'FP8' in self.model_config.model and \
+        # Check if model is quantized
+        is_fp8_plus_cb = self.model_config.quantization is not None and \
             envs_spyre.VLLM_SPYRE_USE_CB
         req_count = 3 if is_fp8_plus_cb else 2
         requests = [