Skip to content

Conversation

@DiweiSun
Copy link

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS ABOVE HAVE BEEN CONSIDERED.

Purpose

Test Plan

Test Result

(Optional) Documentation Update

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

@DiweiSun DiweiSun force-pushed the kernel_benchmark/reshape_and_cache branch from 4738963 to e29841b Compare September 19, 2025 05:11
@DiweiSun DiweiSun force-pushed the kernel_benchmark/reshape_and_cache branch from e29841b to cfe8f7a Compare September 19, 2025 05:16
@DiweiSun
Copy link
Author

DiweiSun commented Sep 19, 2025

BKC to run:

python benchmark/benchmark_reshape_and_cache.py

OR:
python benchmark/benchmark_reshape_and_cache.py --model-name "deepseek-ai/DeepSeek-R1-Distill-Qwen-14B"

Results:

Before:

num_tokens num_heads head_size block_size num_blocks dtype kv_cache_dtype latency (us)
2 8 128 64 1024 half auto 10.452
4 8 128 64 1024 half auto 9.555
8 8 128 64 1024 half auto 9.926
16 8 128 64 1024 half auto 9.437
32 8 128 64 1024 half auto 10.450
64 8 128 64 1024 half auto 9.904
128 8 128 64 1024 half auto 46.160
256 8 128 64 1024 half auto 201.855
512 8 128 64 1024 half auto 424.268
1024 8 128 64 1024 half auto 846.070
2048 8 128 64 1024 half auto 1697.350

After:

Test num_tokens num_heads head_size block_size num_blocks vLLM IPEX
0 2.0 12.0 128.0 64.0 1024.0 21.138 14.040
1 2.0 32.0 128.0 64.0 1024.0 16.172 16.380
2 2.0 40.0 128.0 64.0 1024.0 17.472 17.628
3 2.0 48.0 128.0 64.0 1024.0 18.564 18.876
4 2.0 64.0 128.0 64.0 1024.0 20.228 20.384
5 2.0 96.0 128.0 64.0 1024.0 24.076 24.232
6 2.0 128.0 128.0 64.0 1024.0 27.768 27.768
7 4.0 12.0 128.0 64.0 1024.0 15.496 15.444
8 4.0 32.0 128.0 64.0 1024.0 111.306 433.784
9 4.0 40.0 128.0 64.0 1024.0 21.060 21.164
10 4.0 48.0 128.0 64.0 1024.0 180.726 369.460
11 4.0 64.0 128.0 64.0 1024.0 26.052 25.792
12 4.0 96.0 128.0 64.0 1024.0 32.448 32.032
13 4.0 128.0 128.0 64.0 1024.0 203.632 132.704
14 8.0 12.0 128.0 64.0 1024.0 19.422 17.680
15 8.0 32.0 128.0 64.0 1024.0 266.292 95.966
16 8.0 40.0 128.0 64.0 1024.0 28.912 30.160
17 8.0 48.0 128.0 64.0 1024.0 31.980 31.824
18 8.0 64.0 128.0 64.0 1024.0 39.416 39.624
19 8.0 96.0 128.0 64.0 1024.0 245.960 55.770
20 8.0 128.0 128.0 64.0 1024.0 271.414 77.012
21 16.0 12.0 128.0 64.0 1024.0 275.184 22.568
22 16.0 32.0 128.0 64.0 1024.0 39.468 39.832
23 16.0 40.0 128.0 64.0 1024.0 291.512 247.130
24 16.0 48.0 128.0 64.0 1024.0 55.744 284.388
25 16.0 64.0 128.0 64.0 1024.0 201.708 335.400
26 16.0 96.0 128.0 64.0 1024.0 131.664 255.008
27 16.0 128.0 128.0 64.0 1024.0 186.784 239.616
28 32.0 12.0 128.0 64.0 1024.0 32.422 31.460
29 32.0 32.0 128.0 64.0 1024.0 215.878 77.012
30 32.0 40.0 128.0 64.0 1024.0 277.056 104.884
31 32.0 48.0 128.0 64.0 1024.0 336.856 290.316
32 32.0 64.0 128.0 64.0 1024.0 189.020 262.002
33 32.0 96.0 128.0 64.0 1024.0 302.744 301.860
34 32.0 128.0 128.0 64.0 1024.0 419.068 398.294
35 64.0 12.0 128.0 64.0 1024.0 259.298 277.004
36 64.0 32.0 128.0 64.0 1024.0 187.044 282.672
37 64.0 40.0 128.0 64.0 1024.0 247.156 246.168
38 64.0 48.0 128.0 64.0 1024.0 295.412 305.396
39 64.0 64.0 128.0 64.0 1024.0 422.760 423.124
40 64.0 96.0 128.0 64.0 1024.0 656.396 657.124
41 64.0 128.0 128.0 64.0 1024.0 892.762 873.288
42 128.0 12.0 128.0 64.0 1024.0 274.092 125.320
43 128.0 32.0 128.0 64.0 1024.0 413.374 410.644
44 128.0 40.0 128.0 64.0 1024.0 532.116 535.002
45 128.0 48.0 128.0 64.0 1024.0 646.958 644.930
46 128.0 64.0 128.0 64.0 1024.0 874.978 884.780
47 128.0 96.0 128.0 64.0 1024.0 1360.424 1329.224
48 128.0 128.0 128.0 64.0 1024.0 1821.014 1834.534
49 256.0 12.0 128.0 64.0 1024.0 285.584 291.200
50 256.0 32.0 128.0 64.0 1024.0 870.896 874.640
51 256.0 40.0 128.0 64.0 1024.0 1111.812 1108.068
52 256.0 48.0 128.0 64.0 1024.0 1327.118 1345.344
53 256.0 64.0 128.0 64.0 1024.0 1805.622 1809.470
54 256.0 96.0 128.0 64.0 1024.0 2778.828 2763.592
55 256.0 128.0 128.0 64.0 1024.0 3720.834 3713.008
56 512.0 12.0 128.0 64.0 1024.0 613.288 624.312
57 512.0 32.0 128.0 64.0 1024.0 1780.870 1781.468
58 512.0 40.0 128.0 64.0 1024.0 2263.430 2242.006
59 512.0 48.0 128.0 64.0 1024.0 2735.954 2743.624
60 512.0 64.0 128.0 64.0 1024.0 3687.424 3672.994
61 512.0 96.0 128.0 64.0 1024.0 5611.918 5614.388
62 512.0 128.0 128.0 64.0 1024.0 7564.310 7538.284
63 1024.0 12.0 128.0 64.0 1024.0 1282.112 1286.766
64 1024.0 32.0 128.0 64.0 1024.0 3596.528 3574.220
65 1024.0 40.0 128.0 64.0 1024.0 4573.244 4568.980
66 1024.0 48.0 128.0 64.0 1024.0 5542.264 5533.216
67 1024.0 64.0 128.0 64.0 1024.0 7370.090 7438.262
68 1024.0 96.0 128.0 64.0 1024.0 11320.296 11353.836
69 1024.0 128.0 128.0 64.0 1024.0 15117.284 15136.446
70 2048.0 12.0 128.0 64.0 1024.0 2607.332 2591.680
71 2048.0 32.0 128.0 64.0 1024.0 7277.816 7262.684
72 2048.0 40.0 128.0 64.0 1024.0 9194.432 9175.192
73 2048.0 48.0 128.0 64.0 1024.0 11150.646 11144.952
74 2048.0 64.0 128.0 64.0 1024.0 14891.396 14904.058
75 2048.0 96.0 128.0 64.0 1024.0 22627.514 22739.002
76 2048.0 128.0 128.0 64.0 1024.0 30439.552 30527.848

@jikunshang
Copy link
Collaborator

cc @zufangzhu

@jikunshang
Copy link
Collaborator

if use --model-name "deepseek-ai/DeepSeek-R1-Distill-Qwen-14B", why num_heads still varies? due to tp?

@@ -1,411 +1,82 @@
# SPDX-License-Identifier: Apache-2.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this file from vllm, I think torch.compile version is worthy to keep.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@DiweiSun DiweiSun force-pushed the kernel_benchmark/reshape_and_cache branch from cfe8f7a to ccd7f37 Compare September 19, 2025 05:41
@DiweiSun
Copy link
Author

if use --model-name "deepseek-ai/DeepSeek-R1-Distill-Qwen-14B", why num_heads still varies? due to tp?

I followed Wenjun's PR to prepare the configuration, where the model-specific settings are appended directly to the default configuration. This approach is currently pending further discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants