[V1] Add BlockTable class #11693

WoosukKwon · 2025-01-02T17:05:45Z

This PR adds the BlockTable class, a thin wrapper of the GPU & CPU block table tensors. It will help reduce the complexity of the input preparation logic.
Also, BlockTable optimizes the memory movement for switching the rows of the CPU block table tensor, by tracking the actual number of blocks per row and only doing necessary copies (instead of blindly copying the entire rows).

NOTE: This PR is a precursor to #11401 which optimizes the block table copy from CPU to GPU.

Signed-off-by: Woosuk Kwon <[email protected]>

github-actions · 2025-01-02T17:05:56Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

robertgshaw2-neuralmagic · 2025-01-02T17:15:27Z

vllm/v1/worker/gpu_block_table.py

+        self.block_table.fill_(0)
+        self.block_table_cpu.fill_(0)
+
+    def cuda(self) -> torch.Tensor:


it might make sense to call this something other than cuda() since I think this class can be shared across all backends ideally

Maybe like to_device()

I was about to ask a similar question lol
The file name is "gpu"_block_table.py. Does this BlockTable is supposed to only be used by GPUs or it's actually a general purpose?

@robertgshaw2-neuralmagic @comaniac Good point.
I renamed gpu_block_table.py to block_table.py and cuda to to_device as you suggested.

That being said, I plan to add a GPU-specific optimization to optimize the block table copy from CPU to GPU. Since this optimization will involve a CUDA kernel, it will not be shared with other hardware.
Also, please note that the shape of the block table is actually dependent on the attention kernel. For example, FlashInfer requires a different layout than the current PR. Likewise, other hardwares might want different layouts and therefore possibly different implementations of append_row and move_row.

robertgshaw2-neuralmagic · 2025-01-02T17:20:00Z

nice!

mergify · 2025-01-02T19:05:34Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @WoosukKwon.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Woosuk Kwon <[email protected]>

comaniac · 2025-01-03T00:40:31Z

vllm/v1/worker/block_table.py

+    def to_device(self) -> torch.Tensor:
+        """Ruturns the device tensor of the block table."""
+        return self.block_table


I looked at this API again and found it's a bit weird to call it to_device, because we are not actually transferring tensors in this call (like torch_tensor.to("cuda")). Following the naming convention of .cpu(), this API should be name .device(), but I'm not sure if this makes sense to others.

Good point.

I actually named it device first, and then found that the class already had the device attribute 😂 and PyTorch's convention is x.device returns the device x lives in.

Ah that's true...then another way is naming everything with verb, like to_device, to_cpu, to_numpy. Although we don't actually do any transfer in these calls, this may be less confusion. @robertgshaw2-neuralmagic WDYT?

WoosukKwon added 2 commits January 2, 2025 09:00

[V1] Add BlockTable abstraction

4264a97

Signed-off-by: Woosuk Kwon <[email protected]>

Minor

b181413

Signed-off-by: Woosuk Kwon <[email protected]>

WoosukKwon requested review from robertgshaw2-neuralmagic, njhill, ywang96, comaniac and alexm-neuralmagic as code owners January 2, 2025 17:05

WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 2, 2025

robertgshaw2-neuralmagic approved these changes Jan 2, 2025

View reviewed changes

robertgshaw2-neuralmagic reviewed Jan 2, 2025

View reviewed changes

mergify bot added the needs-rebase label Jan 2, 2025

Merge branch 'main' into v1-blocktable

8550fc8

mergify bot removed the needs-rebase label Jan 3, 2025

Make BlockTable hardware agnostic

66b6f81

Signed-off-by: Woosuk Kwon <[email protected]>

comaniac reviewed Jan 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[V1] Add BlockTable class #11693

[V1] Add BlockTable class #11693

WoosukKwon commented Jan 2, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Jan 2, 2025

robertgshaw2-neuralmagic Jan 2, 2025 •

edited

Loading

robertgshaw2-neuralmagic Jan 2, 2025

comaniac Jan 2, 2025

WoosukKwon Jan 3, 2025

robertgshaw2-neuralmagic commented Jan 2, 2025

mergify bot commented Jan 2, 2025

comaniac Jan 3, 2025 •

edited

Loading

WoosukKwon Jan 3, 2025

comaniac Jan 3, 2025

[V1] Add BlockTable class #11693

Are you sure you want to change the base?

[V1] Add BlockTable class #11693

Conversation

WoosukKwon commented Jan 2, 2025 • edited by github-actions bot Loading

github-actions bot commented Jan 2, 2025

robertgshaw2-neuralmagic Jan 2, 2025 • edited Loading

Choose a reason for hiding this comment

robertgshaw2-neuralmagic Jan 2, 2025

Choose a reason for hiding this comment

comaniac Jan 2, 2025

Choose a reason for hiding this comment

WoosukKwon Jan 3, 2025

Choose a reason for hiding this comment

robertgshaw2-neuralmagic commented Jan 2, 2025

mergify bot commented Jan 2, 2025

comaniac Jan 3, 2025 • edited Loading

Choose a reason for hiding this comment

WoosukKwon Jan 3, 2025

Choose a reason for hiding this comment

comaniac Jan 3, 2025

Choose a reason for hiding this comment

WoosukKwon commented Jan 2, 2025 •

edited by github-actions bot

Loading

robertgshaw2-neuralmagic Jan 2, 2025 •

edited

Loading

comaniac Jan 3, 2025 •

edited

Loading