Skip to content

Conversation

@xinyu-intel
Copy link
Contributor

@xinyu-intel xinyu-intel commented Nov 27, 2025

Avoid the query length(1) of the prefix prefill on the decode side to be padded to the block size under PD+DP scenario.

Use case:

VLLM_EXPONENTIAL_BUCKETING=false/true VLLM_PROMPT_QUERY_BUCKET_MIN=1 on the decode side.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR modifies the bucketing configuration to support query length 1 for prefill operations, preventing unnecessary padding to block size in prefix-decode (PD) scenarios.

  • Changes the minimum query bucket size from block_size to 1
  • Updates dummy prefill batch generation to use query_len=1 and context_len=127
  • Adds support for bucket value 1 in the exponential bucketing warmup logic

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
vllm_gaudi/extension/bucketing/linear.py Sets minimum prompt query bucket to 1 instead of block_size
vllm_gaudi/extension/bucketing/exponential.py Updates exponential bucketing to start from 1 and adds logic to handle bucket value 1
vllm_gaudi/v1/worker/hpu_model_runner.py Adjusts dummy prefill batch to use query_len=1 and context_len=127
tests/unit_tests/test_bucketing.py Adds test case for warmup_range starting with 1

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@xinyu-intel xinyu-intel force-pushed the dev/xinyu/prefill-bucket-one branch from 500c177 to 37b3e7d Compare November 27, 2025 03:29
@github-actions
Copy link

✅ CI Passed

All checks passed successfully against the following vllm commit:
0353d2e162cbda776d9dbfe026e65303204a7f1f

@xinyu-intel xinyu-intel force-pushed the dev/xinyu/prefill-bucket-one branch 2 times, most recently from 32c4ddc to 01a12b7 Compare November 28, 2025 05:17
@github-actions
Copy link

✅ CI Passed

All checks passed successfully against the following vllm commit:
0353d2e162cbda776d9dbfe026e65303204a7f1f

@github-actions
Copy link

✅ CI Passed

All checks passed successfully against the following vllm commit:
0353d2e162cbda776d9dbfe026e65303204a7f1f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants