Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[V1] Add BlockTable class #11693

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

[V1] Add BlockTable class #11693

wants to merge 4 commits into from

Conversation

WoosukKwon
Copy link
Collaborator

@WoosukKwon WoosukKwon commented Jan 2, 2025

This PR adds the BlockTable class, a thin wrapper of the GPU & CPU block table tensors. It will help reduce the complexity of the input preparation logic.
Also, BlockTable optimizes the memory movement for switching the rows of the CPU block table tensor, by tracking the actual number of blocks per row and only doing necessary copies (instead of blindly copying the entire rows).

NOTE: This PR is a precursor to #11401 which optimizes the block table copy from CPU to GPU.

Signed-off-by: Woosuk Kwon <[email protected]>
@WoosukKwon WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 2, 2025
Copy link

github-actions bot commented Jan 2, 2025

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

self.block_table.fill_(0)
self.block_table_cpu.fill_(0)

def cuda(self) -> torch.Tensor:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might make sense to call this something other than cuda() since I think this class can be shared across all backends ideally

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe like to_device()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was about to ask a similar question lol
The file name is "gpu"_block_table.py. Does this BlockTable is supposed to only be used by GPUs or it's actually a general purpose?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robertgshaw2-neuralmagic @comaniac Good point.
I renamed gpu_block_table.py to block_table.py and cuda to to_device as you suggested.

That being said, I plan to add a GPU-specific optimization to optimize the block table copy from CPU to GPU. Since this optimization will involve a CUDA kernel, it will not be shared with other hardware.
Also, please note that the shape of the block table is actually dependent on the attention kernel. For example, FlashInfer requires a different layout than the current PR. Likewise, other hardwares might want different layouts and therefore possibly different implementations of append_row and move_row.

@robertgshaw2-neuralmagic
Copy link
Collaborator

nice!

Copy link

mergify bot commented Jan 2, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @WoosukKwon.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 2, 2025
@mergify mergify bot removed the needs-rebase label Jan 3, 2025
Comment on lines +68 to +70
def to_device(self) -> torch.Tensor:
"""Ruturns the device tensor of the block table."""
return self.block_table
Copy link
Collaborator

@comaniac comaniac Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at this API again and found it's a bit weird to call it to_device, because we are not actually transferring tensors in this call (like torch_tensor.to("cuda")). Following the naming convention of .cpu(), this API should be name .device(), but I'm not sure if this makes sense to others.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

I actually named it device first, and then found that the class already had the device attribute 😂 and PyTorch's convention is x.device returns the device x lives in.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's true...then another way is naming everything with verb, like to_device, to_cpu, to_numpy. Although we don't actually do any transfer in these calls, this may be less confusion. @robertgshaw2-neuralmagic WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants