[Continuous batching] Initial cb test #52

nikolaospapandreou · 2025-03-26T15:55:40Z

Initial CB implementation (for vLLM V1). Works with FMS model wrapper.
Test with offline_inference_spyre_cb_test.py, set VLLM_SPYRE_USE_CB to 1 for continuous batching or 0 for static batching.

Signed-off-by: Nikolaos Papandreou <[email protected]>

github-actions · 2025-03-26T15:55:51Z

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes:

pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

tdoublep · 2025-03-26T16:36:17Z

vllm_spyre/model_executor/model_loader/spyre.py

-            self.tkv = 0
-            if not envs_spyre.VLLM_SPYRE_USE_CB:
-                self.model.past_key_value_states = None
+        self.tkv = tkv


Why do we set self.tkv here? It looks like it is not used.

tdoublep · 2025-03-26T16:36:31Z

vllm_spyre/model_executor/model_loader/spyre.py

            only_last_token=True,
            tkv=self.tkv,
-            active_pages=[i for i in range(input_ids.shape[0])],
+            #active_pages=[i for i in range(input_ids.shape[0])],


nit: remove commented out lines

tdoublep · 2025-03-26T16:38:17Z

vllm_spyre/v1/core/scheduler.py

        outputs = super().schedule()
        return outputs

+    def schedule_cb(self) -> "SchedulerOutput":


I would propose have separate classes StaticBatchingSpyreScheduler and ContinuousBatchingSpyreScheduler and just implementing the schedule function differently, rather than having two functions. This could be addressed in the PR to main though, rather than this one.

tdoublep · 2025-03-26T16:38:54Z

vllm_spyre/v1/core/scheduler.py

+                available_warmup_shapes = [
+                    shape for shape in available_warmup_shapes
+                    if request.num_prompt_tokens <= shape['prompt_length']
+                    and max_tokens <= shape['new_tokens']
+                    and len(self.waiting) < shape['batch_size']
+                ]


I think that the continuous batching logic should not depend on the warmup shapes in this way?

tdoublep · 2025-03-26T16:41:27Z

vllm_spyre/v1/worker/spyre_model_runner.py

+        self._req_ids2idx_prompt: dict = {}
+        self._req_ids2idx_decode: dict = {}
+        self._decode_batch_size = 0
+        self._active_pages = []
+        self._free_page_idxs = []
+        self._position_ids_prompt: torch.Tensor = None
+        self._mask_prompt: torch.Tensor = None
+        self._tkv: int = 0
+        self._tkv2fms: int = 0
+        self._prev_step_dec = False


In my opinion, the management of pages here is implemented at the wrong level. We should not be trying to maintain all of this state in the model runner itself. We should be looking at how vLLM (V1) is implemented on GPU as a guide (e.g., we should be using something like the InputBatch class to maintain this state). I think as a first attempt it is fine, and we can iteratively improve it from here.

tdoublep · 2025-03-26T16:42:00Z

vllm_spyre/v1/worker/spyre_model_runner.py

+        warmup_shapes = current_platform.get_warmup_shapes()
+        max_prompt_length = max(shape["prompt_length"]
+                                for shape in warmup_shapes)
+        max_batch_size = max(shape["batch_size"] for shape in warmup_shapes)


Continuous batching should not use warmup shapes

tdoublep · 2025-03-26T16:42:30Z

vllm_spyre/v1/worker/spyre_model_runner.py

+    def _prepare_prompt_cb(
+        self,
+        new_requests: List[NewRequestData],
+    ) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor, List[int]]:


Similar to scheduler, I would suggest splitting into two model runner classes SpyreModelRunnner and ContinuousBatchingSpyreModelRunner or similar

tdoublep · 2025-03-26T16:46:36Z

vllm_spyre/v1/worker/spyre_model_runner.py

            positions=model_input.input_positions,
            masks=model_input.input_masks,
            is_prompt=model_input.is_prompt,
+            tkv = self._tkv2fms,


Why is self._tkv2fms needed? Can't we just use self.tkv?

Signed-off-by: Nikolaos Papandreou <[email protected]>

examples/offline_inference_spyre_cb_test.py

Signed-off-by: Yannick Schnider <[email protected]>

…input anyway... Signed-off-by: Yannick Schnider <[email protected]>

Signed-off-by: Yannick Schnider <[email protected]>

… Spyre Signed-off-by: Yannick Schnider <[email protected]>

tdoublep · 2025-04-07T09:04:55Z

vllm_spyre/platform.py

+                # For continuous batching we use max_num_seqs to control
+                # the max batch size respecting AIU Spyre KV cache size
+                scheduler_config.max_num_seqs =\
+                    envs_spyre.VLLM_SPYRE_MAX_BATCH_SIZE
+                # ToDo: this function check_and_update_config is called twice:
+                # 1st time scheduler_config.max_num_seqs is what user sets
+                # 2nd time scheduler_config.max_num_seqs is 128


I don't think we need to override max-num-seqs at all?

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 · 2025-04-07T13:21:30Z

LGTM

* initial cb test Signed-off-by: Nikolaos Papandreou <[email protected]> * make tkv, active_pages optional in SpyreCausalLM class for the V0 tests Signed-off-by: Nikolaos Papandreou <[email protected]> * format Signed-off-by: Nikolaos Papandreou <[email protected]> * remove manual testing and fix formatting Signed-off-by: Yannick Schnider <[email protected]> * remove tkv2fms Signed-off-by: Yannick Schnider <[email protected]> * remove unnecessary class variables Signed-off-by: Yannick Schnider <[email protected]> * tidy up class variables Signed-off-by: Yannick Schnider <[email protected]> * simplify code: req_ids2idx and active_pages will be reset in prepare input anyway... Signed-off-by: Yannick Schnider <[email protected]> * renaming variable Signed-off-by: Yannick Schnider <[email protected]> * removing batch padding in prefil stage Signed-off-by: Yannick Schnider <[email protected]> * indices always list of Trues since no padding or removed sequences... Signed-off-by: Yannick Schnider <[email protected]> * fix active/free page handling Signed-off-by: Yannick Schnider <[email protected]> * avoiding unnecessary tensor construction Signed-off-by: Yannick Schnider <[email protected]> * fix sorting indifference token/position_ids vs masks Signed-off-by: Yannick Schnider <[email protected]> * refactoring not requiring req_ids2idx Signed-off-by: Yannick Schnider <[email protected]> * removing unsused class variables, simplifying code Signed-off-by: Yannick Schnider <[email protected]> * use VLLM_SPYRE_MAX_BATCH_SIZE to control (decoding) batch size on AIU Spyre Signed-off-by: Yannick Schnider <[email protected]> * removing unnecessary helper functions for schedule and add_request Signed-off-by: Yannick Schnider <[email protected]> * removing unused argument Signed-off-by: Yannick Schnider <[email protected]> --------- Signed-off-by: Nikolaos Papandreou <[email protected]> Signed-off-by: Yannick Schnider <[email protected]> Co-authored-by: Yannick Schnider <[email protected]>

* [Continuous batching] FMS model wrapper (#18) * fms wrapper dummy for continuous batching implementation, gating via env var VLLM_SPYRE_USE_CB Signed-off-by: Yannick Schnider <[email protected]> * implementing fms wrapper with correct KV cache managment Signed-off-by: Yannick Schnider <[email protected]> * disable prints by default Signed-off-by: Yannick Schnider <[email protected]> * code refactoring fms wrapper Signed-off-by: Yannick Schnider <[email protected]> * fix default path not using CB/ fms wrapper Signed-off-by: Yannick Schnider <[email protected]> * correct print when TESTING_CB Signed-off-by: Yannick Schnider <[email protected]> * remove self.past_key_value_states when KV cache is managed by FMS wrapper Signed-off-by: Yannick Schnider <[email protected]> * read-out only active pages of KV cache (covers when curr batch size < max batch size) Signed-off-by: Yannick Schnider <[email protected]> * uniquely distinguishing prefills and decodes Signed-off-by: Yannick Schnider <[email protected]> * reading kv cache dimension from model config Signed-off-by: Yannick Schnider <[email protected]> * cosmetics and comments Signed-off-by: Yannick Schnider <[email protected]> * support for gpt big code models Signed-off-by: Yannick Schnider <[email protected]> * bugfix hard coded test mask Signed-off-by: Yannick Schnider <[email protected]> * change KV cache type for prefill Signed-off-by: Yannick Schnider <[email protected]> * update tkv in fms wrapper Signed-off-by: Yannick Schnider <[email protected]> * moving fms wrapper to own class Signed-off-by: Yannick Schnider <[email protected]> * reset tkv for new prompt Signed-off-by: Yannick Schnider <[email protected]> * ignoring test_spyre_tensor_parallel.py, since FMS wrapper does not support it Signed-off-by: Yannick Schnider <[email protected]> * removing VLLM_SPYRE_USE_CB, since FMS wrapper is now used by default Signed-off-by: Yannick Schnider <[email protected]> * typing fms wrapper class Signed-off-by: Yannick Schnider <[email protected]> --------- Signed-off-by: Yannick Schnider <[email protected]> * moving model loading into FMS wrapper (#35) Signed-off-by: Yannick Schnider <[email protected]> * bugfix idx kv cache update (#40) Signed-off-by: Yannick Schnider <[email protected]> * FMS Wrapper for static batching (#39) * introducing pseudo fms wrapper for static batching Signed-off-by: Yannick Schnider <[email protected]> * small bug fix Signed-off-by: Yannick Schnider <[email protected]> * bugfix idx kv cache update Signed-off-by: Yannick Schnider <[email protected]> --------- Signed-off-by: Yannick Schnider <[email protected]> Signed-off-by: Yannick Schnider <[email protected]> * [Continuous Batching] Introducing new env variables (#67) * introducing env variables for AIU Spyre KV cache dimensions Signed-off-by: Yannick Schnider <[email protected]> * removing prints Signed-off-by: Yannick Schnider <[email protected]> --------- Signed-off-by: Yannick Schnider <[email protected]> * [Continuous batching] Initial cb test (#52) * initial cb test Signed-off-by: Nikolaos Papandreou <[email protected]> * make tkv, active_pages optional in SpyreCausalLM class for the V0 tests Signed-off-by: Nikolaos Papandreou <[email protected]> * format Signed-off-by: Nikolaos Papandreou <[email protected]> * remove manual testing and fix formatting Signed-off-by: Yannick Schnider <[email protected]> * remove tkv2fms Signed-off-by: Yannick Schnider <[email protected]> * remove unnecessary class variables Signed-off-by: Yannick Schnider <[email protected]> * tidy up class variables Signed-off-by: Yannick Schnider <[email protected]> * simplify code: req_ids2idx and active_pages will be reset in prepare input anyway... Signed-off-by: Yannick Schnider <[email protected]> * renaming variable Signed-off-by: Yannick Schnider <[email protected]> * removing batch padding in prefil stage Signed-off-by: Yannick Schnider <[email protected]> * indices always list of Trues since no padding or removed sequences... Signed-off-by: Yannick Schnider <[email protected]> * fix active/free page handling Signed-off-by: Yannick Schnider <[email protected]> * avoiding unnecessary tensor construction Signed-off-by: Yannick Schnider <[email protected]> * fix sorting indifference token/position_ids vs masks Signed-off-by: Yannick Schnider <[email protected]> * refactoring not requiring req_ids2idx Signed-off-by: Yannick Schnider <[email protected]> * removing unsused class variables, simplifying code Signed-off-by: Yannick Schnider <[email protected]> * use VLLM_SPYRE_MAX_BATCH_SIZE to control (decoding) batch size on AIU Spyre Signed-off-by: Yannick Schnider <[email protected]> * removing unnecessary helper functions for schedule and add_request Signed-off-by: Yannick Schnider <[email protected]> * removing unused argument Signed-off-by: Yannick Schnider <[email protected]> --------- Signed-off-by: Nikolaos Papandreou <[email protected]> Signed-off-by: Yannick Schnider <[email protected]> Co-authored-by: Yannick Schnider <[email protected]> * re-enabling TP tests Signed-off-by: Yannick Schnider <[email protected]> * addressing feedback: renaming and removing unused stuff Signed-off-by: Yannick Schnider <[email protected]> * removing unnecessary getter function and other feedback Signed-off-by: Yannick Schnider <[email protected]> --------- Signed-off-by: Yannick Schnider <[email protected]> Signed-off-by: Yannick Schnider <[email protected]> Signed-off-by: Nikolaos Papandreou <[email protected]> Co-authored-by: Nikolaos Papandreou <[email protected]>

…llm-project#66) * [Continuous batching] FMS model wrapper (vllm-project#18) * fms wrapper dummy for continuous batching implementation, gating via env var VLLM_SPYRE_USE_CB Signed-off-by: Yannick Schnider <[email protected]> * implementing fms wrapper with correct KV cache managment Signed-off-by: Yannick Schnider <[email protected]> * disable prints by default Signed-off-by: Yannick Schnider <[email protected]> * code refactoring fms wrapper Signed-off-by: Yannick Schnider <[email protected]> * fix default path not using CB/ fms wrapper Signed-off-by: Yannick Schnider <[email protected]> * correct print when TESTING_CB Signed-off-by: Yannick Schnider <[email protected]> * remove self.past_key_value_states when KV cache is managed by FMS wrapper Signed-off-by: Yannick Schnider <[email protected]> * read-out only active pages of KV cache (covers when curr batch size < max batch size) Signed-off-by: Yannick Schnider <[email protected]> * uniquely distinguishing prefills and decodes Signed-off-by: Yannick Schnider <[email protected]> * reading kv cache dimension from model config Signed-off-by: Yannick Schnider <[email protected]> * cosmetics and comments Signed-off-by: Yannick Schnider <[email protected]> * support for gpt big code models Signed-off-by: Yannick Schnider <[email protected]> * bugfix hard coded test mask Signed-off-by: Yannick Schnider <[email protected]> * change KV cache type for prefill Signed-off-by: Yannick Schnider <[email protected]> * update tkv in fms wrapper Signed-off-by: Yannick Schnider <[email protected]> * moving fms wrapper to own class Signed-off-by: Yannick Schnider <[email protected]> * reset tkv for new prompt Signed-off-by: Yannick Schnider <[email protected]> * ignoring test_spyre_tensor_parallel.py, since FMS wrapper does not support it Signed-off-by: Yannick Schnider <[email protected]> * removing VLLM_SPYRE_USE_CB, since FMS wrapper is now used by default Signed-off-by: Yannick Schnider <[email protected]> * typing fms wrapper class Signed-off-by: Yannick Schnider <[email protected]> --------- Signed-off-by: Yannick Schnider <[email protected]> * moving model loading into FMS wrapper (vllm-project#35) Signed-off-by: Yannick Schnider <[email protected]> * bugfix idx kv cache update (vllm-project#40) Signed-off-by: Yannick Schnider <[email protected]> * FMS Wrapper for static batching (vllm-project#39) * introducing pseudo fms wrapper for static batching Signed-off-by: Yannick Schnider <[email protected]> * small bug fix Signed-off-by: Yannick Schnider <[email protected]> * bugfix idx kv cache update Signed-off-by: Yannick Schnider <[email protected]> --------- Signed-off-by: Yannick Schnider <[email protected]> Signed-off-by: Yannick Schnider <[email protected]> * [Continuous Batching] Introducing new env variables (vllm-project#67) * introducing env variables for AIU Spyre KV cache dimensions Signed-off-by: Yannick Schnider <[email protected]> * removing prints Signed-off-by: Yannick Schnider <[email protected]> --------- Signed-off-by: Yannick Schnider <[email protected]> * [Continuous batching] Initial cb test (vllm-project#52) * initial cb test Signed-off-by: Nikolaos Papandreou <[email protected]> * make tkv, active_pages optional in SpyreCausalLM class for the V0 tests Signed-off-by: Nikolaos Papandreou <[email protected]> * format Signed-off-by: Nikolaos Papandreou <[email protected]> * remove manual testing and fix formatting Signed-off-by: Yannick Schnider <[email protected]> * remove tkv2fms Signed-off-by: Yannick Schnider <[email protected]> * remove unnecessary class variables Signed-off-by: Yannick Schnider <[email protected]> * tidy up class variables Signed-off-by: Yannick Schnider <[email protected]> * simplify code: req_ids2idx and active_pages will be reset in prepare input anyway... Signed-off-by: Yannick Schnider <[email protected]> * renaming variable Signed-off-by: Yannick Schnider <[email protected]> * removing batch padding in prefil stage Signed-off-by: Yannick Schnider <[email protected]> * indices always list of Trues since no padding or removed sequences... Signed-off-by: Yannick Schnider <[email protected]> * fix active/free page handling Signed-off-by: Yannick Schnider <[email protected]> * avoiding unnecessary tensor construction Signed-off-by: Yannick Schnider <[email protected]> * fix sorting indifference token/position_ids vs masks Signed-off-by: Yannick Schnider <[email protected]> * refactoring not requiring req_ids2idx Signed-off-by: Yannick Schnider <[email protected]> * removing unsused class variables, simplifying code Signed-off-by: Yannick Schnider <[email protected]> * use VLLM_SPYRE_MAX_BATCH_SIZE to control (decoding) batch size on AIU Spyre Signed-off-by: Yannick Schnider <[email protected]> * removing unnecessary helper functions for schedule and add_request Signed-off-by: Yannick Schnider <[email protected]> * removing unused argument Signed-off-by: Yannick Schnider <[email protected]> --------- Signed-off-by: Nikolaos Papandreou <[email protected]> Signed-off-by: Yannick Schnider <[email protected]> Co-authored-by: Yannick Schnider <[email protected]> * re-enabling TP tests Signed-off-by: Yannick Schnider <[email protected]> * addressing feedback: renaming and removing unused stuff Signed-off-by: Yannick Schnider <[email protected]> * removing unnecessary getter function and other feedback Signed-off-by: Yannick Schnider <[email protected]> --------- Signed-off-by: Yannick Schnider <[email protected]> Signed-off-by: Yannick Schnider <[email protected]> Signed-off-by: Nikolaos Papandreou <[email protected]> Co-authored-by: Nikolaos Papandreou <[email protected]>

* [Continuous batching] FMS model wrapper (#18) * fms wrapper dummy for continuous batching implementation, gating via env var VLLM_SPYRE_USE_CB Signed-off-by: Yannick Schnider <[email protected]> * implementing fms wrapper with correct KV cache managment Signed-off-by: Yannick Schnider <[email protected]> * disable prints by default Signed-off-by: Yannick Schnider <[email protected]> * code refactoring fms wrapper Signed-off-by: Yannick Schnider <[email protected]> * fix default path not using CB/ fms wrapper Signed-off-by: Yannick Schnider <[email protected]> * correct print when TESTING_CB Signed-off-by: Yannick Schnider <[email protected]> * remove self.past_key_value_states when KV cache is managed by FMS wrapper Signed-off-by: Yannick Schnider <[email protected]> * read-out only active pages of KV cache (covers when curr batch size < max batch size) Signed-off-by: Yannick Schnider <[email protected]> * uniquely distinguishing prefills and decodes Signed-off-by: Yannick Schnider <[email protected]> * reading kv cache dimension from model config Signed-off-by: Yannick Schnider <[email protected]> * cosmetics and comments Signed-off-by: Yannick Schnider <[email protected]> * support for gpt big code models Signed-off-by: Yannick Schnider <[email protected]> * bugfix hard coded test mask Signed-off-by: Yannick Schnider <[email protected]> * change KV cache type for prefill Signed-off-by: Yannick Schnider <[email protected]> * update tkv in fms wrapper Signed-off-by: Yannick Schnider <[email protected]> * moving fms wrapper to own class Signed-off-by: Yannick Schnider <[email protected]> * reset tkv for new prompt Signed-off-by: Yannick Schnider <[email protected]> * ignoring test_spyre_tensor_parallel.py, since FMS wrapper does not support it Signed-off-by: Yannick Schnider <[email protected]> * removing VLLM_SPYRE_USE_CB, since FMS wrapper is now used by default Signed-off-by: Yannick Schnider <[email protected]> * typing fms wrapper class Signed-off-by: Yannick Schnider <[email protected]> --------- Signed-off-by: Yannick Schnider <[email protected]> * moving model loading into FMS wrapper (#35) Signed-off-by: Yannick Schnider <[email protected]> * bugfix idx kv cache update (#40) Signed-off-by: Yannick Schnider <[email protected]> * FMS Wrapper for static batching (#39) * introducing pseudo fms wrapper for static batching Signed-off-by: Yannick Schnider <[email protected]> * small bug fix Signed-off-by: Yannick Schnider <[email protected]> * bugfix idx kv cache update Signed-off-by: Yannick Schnider <[email protected]> --------- Signed-off-by: Yannick Schnider <[email protected]> Signed-off-by: Yannick Schnider <[email protected]> * [Continuous Batching] Introducing new env variables (#67) * introducing env variables for AIU Spyre KV cache dimensions Signed-off-by: Yannick Schnider <[email protected]> * removing prints Signed-off-by: Yannick Schnider <[email protected]> --------- Signed-off-by: Yannick Schnider <[email protected]> * [Continuous batching] Initial cb test (#52) * initial cb test Signed-off-by: Nikolaos Papandreou <[email protected]> * make tkv, active_pages optional in SpyreCausalLM class for the V0 tests Signed-off-by: Nikolaos Papandreou <[email protected]> * format Signed-off-by: Nikolaos Papandreou <[email protected]> * remove manual testing and fix formatting Signed-off-by: Yannick Schnider <[email protected]> * remove tkv2fms Signed-off-by: Yannick Schnider <[email protected]> * remove unnecessary class variables Signed-off-by: Yannick Schnider <[email protected]> * tidy up class variables Signed-off-by: Yannick Schnider <[email protected]> * simplify code: req_ids2idx and active_pages will be reset in prepare input anyway... Signed-off-by: Yannick Schnider <[email protected]> * renaming variable Signed-off-by: Yannick Schnider <[email protected]> * removing batch padding in prefil stage Signed-off-by: Yannick Schnider <[email protected]> * indices always list of Trues since no padding or removed sequences... Signed-off-by: Yannick Schnider <[email protected]> * fix active/free page handling Signed-off-by: Yannick Schnider <[email protected]> * avoiding unnecessary tensor construction Signed-off-by: Yannick Schnider <[email protected]> * fix sorting indifference token/position_ids vs masks Signed-off-by: Yannick Schnider <[email protected]> * refactoring not requiring req_ids2idx Signed-off-by: Yannick Schnider <[email protected]> * removing unsused class variables, simplifying code Signed-off-by: Yannick Schnider <[email protected]> * use VLLM_SPYRE_MAX_BATCH_SIZE to control (decoding) batch size on AIU Spyre Signed-off-by: Yannick Schnider <[email protected]> * removing unnecessary helper functions for schedule and add_request Signed-off-by: Yannick Schnider <[email protected]> * removing unused argument Signed-off-by: Yannick Schnider <[email protected]> --------- Signed-off-by: Nikolaos Papandreou <[email protected]> Signed-off-by: Yannick Schnider <[email protected]> Co-authored-by: Yannick Schnider <[email protected]> * re-enabling TP tests Signed-off-by: Yannick Schnider <[email protected]> * addressing feedback: renaming and removing unused stuff Signed-off-by: Yannick Schnider <[email protected]> * removing unnecessary getter function and other feedback Signed-off-by: Yannick Schnider <[email protected]> * integrating new FMS API on branch 'paged_attn_mock' Signed-off-by: Yannick Schnider <[email protected]> * torch dynamo: mark dynamic/static shapes Signed-off-by: Yannick Schnider <[email protected]> * bugfix key_value_states name Signed-off-by: Nikolaos Papandreou <[email protected]> * making block_table and slot_mapping args, not class vars Signed-off-by: Yannick Schnider <[email protected]> * formatting after browser merge... Signed-off-by: Yannick Schnider <[email protected]> * nicer handling of arguments continuous vs static batching Signed-off-by: Yannick Schnider <[email protected]> * Implement warmup for continuous batching (#83) * Implement warmup for continuous batching Signed-off-by: Thomas Parnell <[email protected]> * fmt Signed-off-by: Thomas Parnell <[email protected]> * freeing block directly and small things Signed-off-by: Yannick Schnider <[email protected]> --------- Signed-off-by: Thomas Parnell <[email protected]> Signed-off-by: Yannick Schnider <[email protected]> Co-authored-by: Yannick Schnider <[email protected]> * initialize tkv Signed-off-by: Nikolaos Papandreou <[email protected]> * Return empty ModelRunnerOuptut if no work Signed-off-by: Nikolaos Papandreou <[email protected]> * update mask for decode Signed-off-by: Nikolaos Papandreou <[email protected]> * Fix copy/paste error Signed-off-by: Thomas Parnell <[email protected]> * adaptive loging (thx joerunde) Co-authored-by: Joe Runde <[email protected]> Signed-off-by: Yannick Schnider <[email protected]> * remove warmup shapes for continuous batching Signed-off-by: Yannick Schnider <[email protected]> * assuring prefil lengths are multiples of block size 64 in example script Signed-off-by: Yannick Schnider <[email protected]> * revert change to warmup shape Signed-off-by: Thomas Parnell <[email protected]> * 🎨 fmt Signed-off-by: Joe Runde <[email protected]> * Added call to update_lazyhandle Signed-off-by: Thomas Parnell <[email protected]> * Right padding of prompts (#95) * right padding initial implementation Signed-off-by: Yannick Schnider <[email protected]> * fix right padding: remove the right padded logits before sampling Signed-off-by: Yannick Schnider <[email protected]> * fix typing Signed-off-by: Yannick Schnider <[email protected]> --------- Signed-off-by: Yannick Schnider <[email protected]> * [CB] Fix Tensor Parallelism Error (#103) * divide tensor third dimension by number of TP Signed-off-by: Sophie du Couédic <[email protected]> * Use existing method from vllm to get 'num_kv_heads' (works also for TP>1) Signed-off-by: Sophie du Couédic <[email protected]> --------- Signed-off-by: Sophie du Couédic <[email protected]> * support granite-3.2-8b-instruct (#106) Signed-off-by: Yannick Schnider <[email protected]> Signed-off-by: Yannick Schnider <[email protected]> * comments Signed-off-by: Yannick Schnider <[email protected]> * adapt to change of arguments in fms Signed-off-by: Yannick Schnider <[email protected]> * fix mypy issue Signed-off-by: Yannick Schnider <[email protected]> * revising continuous batching scheduler Signed-off-by: Yannick Schnider <[email protected]> * [V1] Decoupling static and continuous batching (#116) * decoupling static and continuous batching scheduler Signed-off-by: Yannick Schnider <[email protected]> * fix dynamo cache for continuous batching Signed-off-by: Yannick Schnider <[email protected]> * removing warmup shape dependency for continuous batching! Signed-off-by: Yannick Schnider <[email protected]> --------- Signed-off-by: Yannick Schnider <[email protected]> * addressing review cosmetics Signed-off-by: Yannick Schnider <[email protected]> * fix/refactor: remove last_running and total_running (#112) Signed-off-by: Travis Johnson <[email protected]> Signed-off-by: Yannick Schnider <[email protected]> Co-authored-by: Yannick Schnider <[email protected]> * fix comment kv cache tensor initialization Signed-off-by: Yannick Schnider <[email protected]> --------- Signed-off-by: Yannick Schnider <[email protected]> Signed-off-by: Yannick Schnider <[email protected]> Signed-off-by: Nikolaos Papandreou <[email protected]> Signed-off-by: Thomas Parnell <[email protected]> Signed-off-by: Joe Runde <[email protected]> Signed-off-by: Sophie du Couédic <[email protected]> Signed-off-by: Travis Johnson <[email protected]> Co-authored-by: Nikolaos Papandreou <[email protected]> Co-authored-by: Thomas Parnell <[email protected]> Co-authored-by: Joe Runde <[email protected]> Co-authored-by: Sophie du Couédic <[email protected]> Co-authored-by: Travis Johnson <[email protected]>

initial cb test

d3f42ed

Signed-off-by: Nikolaos Papandreou <[email protected]>

nikolaospapandreou requested review from sducouedic, tdoublep and yannicks1 March 26, 2025 15:55

tdoublep reviewed Mar 26, 2025

View reviewed changes

nikolaospapandreou added 3 commits March 28, 2025 21:19

make tkv, active_pages optional in SpyreCausalLM class for the V0 tests

d6b7afa

Signed-off-by: Nikolaos Papandreou <[email protected]>

sync with dev branch, new classes for static and continuous batching

9d4c961

Signed-off-by: Nikolaos Papandreou <[email protected]>

format

ac1369a

Signed-off-by: Nikolaos Papandreou <[email protected]>

yannicks1 reviewed Apr 4, 2025

View reviewed changes

examples/offline_inference_spyre_cb_test.py Show resolved Hide resolved

yannicks1 added 14 commits April 4, 2025 09:56

remove manual testing and fix formatting

8cd2318

Signed-off-by: Yannick Schnider <[email protected]>

remove tkv2fms

6cf29aa

Signed-off-by: Yannick Schnider <[email protected]>

remove unnecessary class variables

a6942a3

Signed-off-by: Yannick Schnider <[email protected]>

tidy up class variables

dbc24e3

Signed-off-by: Yannick Schnider <[email protected]>

simplify code: req_ids2idx and active_pages will be reset in prepare …

fb43f8c

…input anyway... Signed-off-by: Yannick Schnider <[email protected]>

renaming variable

04af67b

Signed-off-by: Yannick Schnider <[email protected]>

removing batch padding in prefil stage

1135210

Signed-off-by: Yannick Schnider <[email protected]>

indices always list of Trues since no padding or removed sequences...

a184b0b

Signed-off-by: Yannick Schnider <[email protected]>

fix active/free page handling

98bf15a

Signed-off-by: Yannick Schnider <[email protected]>

avoiding unnecessary tensor construction

e1dd52b

Signed-off-by: Yannick Schnider <[email protected]>

fix sorting indifference token/position_ids vs masks

c54fcee

Signed-off-by: Yannick Schnider <[email protected]>

refactoring not requiring req_ids2idx

47ed1e7

Signed-off-by: Yannick Schnider <[email protected]>

removing unsused class variables, simplifying code

cbb5980

Signed-off-by: Yannick Schnider <[email protected]>

use VLLM_SPYRE_MAX_BATCH_SIZE to control (decoding) batch size on AIU…

717f05f

… Spyre Signed-off-by: Yannick Schnider <[email protected]>

tdoublep reviewed Apr 7, 2025

View reviewed changes

yannicks1 added 2 commits April 7, 2025 12:41

removing unnecessary helper functions for schedule and add_request

80850ca

Signed-off-by: Yannick Schnider <[email protected]>

removing unused argument

fd670af

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 merged commit 3ab164d into dev-continuous-batching Apr 7, 2025
2 checks passed

yannicks1 deleted the npo-cb-test branch April 7, 2025 13:21

[Continuous batching] Initial cb test #52

[Continuous batching] Initial cb test #52

Uh oh!

Conversation

nikolaospapandreou commented Mar 26, 2025

Uh oh!

github-actions bot commented Mar 26, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yannicks1 commented Apr 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants