bucket: add query len 1 to prefill bucket #645

xinyu-intel · 2025-11-27T02:17:05Z

Avoid the query length(1) of the prefix prefill on the decode side to be padded to the block size under PD+DP scenario.

Use case:

VLLM_EXPONENTIAL_BUCKETING=false/true VLLM_PROMPT_QUERY_BUCKET_MIN=1 on the decode side.

Copilot

Pull request overview

This PR modifies the bucketing configuration to support query length 1 for prefill operations, preventing unnecessary padding to block size in prefix-decode (PD) scenarios.

Changes the minimum query bucket size from block_size to 1
Updates dummy prefill batch generation to use query_len=1 and context_len=127
Adds support for bucket value 1 in the exponential bucketing warmup logic

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
vllm_gaudi/extension/bucketing/linear.py	Sets minimum prompt query bucket to 1 instead of block_size
vllm_gaudi/extension/bucketing/exponential.py	Updates exponential bucketing to start from 1 and adds logic to handle bucket value 1
vllm_gaudi/v1/worker/hpu_model_runner.py	Adjusts dummy prefill batch to use query_len=1 and context_len=127
tests/unit_tests/test_bucketing.py	Adds test case for warmup_range starting with 1

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

vllm_gaudi/extension/bucketing/exponential.py

vllm_gaudi/v1/worker/hpu_model_runner.py

github-actions · 2025-11-27T04:35:05Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
0353d2e162cbda776d9dbfe026e65303204a7f1f

Signed-off-by: Xinyu Chen <[email protected]>

Signed-off-by: Wuxun Zhang <[email protected]>

github-actions · 2025-11-28T06:19:42Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
0353d2e162cbda776d9dbfe026e65303204a7f1f

vllm_gaudi/v1/worker/hpu_dp_utils.py

Signed-off-by: Xinyu Chen <[email protected]>

Co-authored-by: Wuxun Zhang <[email protected]> Signed-off-by: Xinyu Chen <[email protected]>

github-actions · 2025-11-28T08:53:21Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
0353d2e162cbda776d9dbfe026e65303204a7f1f

Copilot AI review requested due to automatic review settings November 27, 2025 02:17

xinyu-intel requested review from adobrzyn, afierka-intel, iboiko-habana, kamil-kaczor, ksmusz, kzawora-intel, mgawarkiewicz-intel, michalkuligowski, vivekgoe and xuechendi as code owners November 27, 2025 02:17

Copilot AI reviewed Nov 27, 2025

View reviewed changes

vllm_gaudi/extension/bucketing/exponential.py Outdated Show resolved Hide resolved

vllm_gaudi/v1/worker/hpu_model_runner.py Show resolved Hide resolved

vllm_gaudi/v1/worker/hpu_model_runner.py Show resolved Hide resolved

xinyu-intel force-pushed the dev/xinyu/prefill-bucket-one branch from 500c177 to 37b3e7d Compare November 27, 2025 03:29

xinyu-intel force-pushed the dev/xinyu/prefill-bucket-one branch 2 times, most recently from 32c4ddc to 01a12b7 Compare November 28, 2025 05:17

xinyu-intel and others added 2 commits November 28, 2025 13:29

bucket: add query len 1 to prefill bucket

2e56818

Signed-off-by: Xinyu Chen <[email protected]>

make sure num tokens divisible by tp_size

01a12b7

Signed-off-by: Wuxun Zhang <[email protected]>

wuxun-zhang reviewed Nov 28, 2025

View reviewed changes

vllm_gaudi/v1/worker/hpu_dp_utils.py Outdated Show resolved Hide resolved

xinyu-intel and others added 2 commits November 28, 2025 15:55

bucket: add query len 1 to prefill exp bucket

0e75f53

Signed-off-by: Xinyu Chen <[email protected]>

Update vllm_gaudi/v1/worker/hpu_dp_utils.py

fd61e58

Co-authored-by: Wuxun Zhang <[email protected]> Signed-off-by: Xinyu Chen <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bucket: add query len 1 to prefill bucket #645

bucket: add query len 1 to prefill bucket #645

Uh oh!

xinyu-intel commented Nov 27, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Nov 27, 2025

Uh oh!

github-actions bot commented Nov 28, 2025

Uh oh!

Uh oh!

github-actions bot commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bucket: add query len 1 to prefill bucket #645

Are you sure you want to change the base?

bucket: add query len 1 to prefill bucket #645

Uh oh!

Conversation

xinyu-intel commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Nov 27, 2025

✅ CI Passed

Uh oh!

github-actions bot commented Nov 28, 2025

✅ CI Passed

Uh oh!

Uh oh!

github-actions bot commented Nov 28, 2025

✅ CI Passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xinyu-intel commented Nov 27, 2025 •

edited

Loading