[Kernel benchmark] enable ipex path for reshape and cache #41

DiweiSun · 2025-09-19T05:08:22Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS ABOVE HAVE BEEN CONSIDERED.

Purpose

Test Plan

Test Result

(Optional) Documentation Update

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

Signed-off-by: diwei sun <[email protected]>

DiweiSun · 2025-09-19T05:17:35Z

BKC to run:

python benchmark/benchmark_reshape_and_cache.py

OR:
python benchmark/benchmark_reshape_and_cache.py --model-name "deepseek-ai/DeepSeek-R1-Distill-Qwen-14B"

Results:

Before:

num_tokens	num_heads	head_size	block_size	num_blocks	dtype	kv_cache_dtype	latency (us)
2	8	128	64	1024	half	auto	10.452
4	8	128	64	1024	half	auto	9.555
8	8	128	64	1024	half	auto	9.926
16	8	128	64	1024	half	auto	9.437
32	8	128	64	1024	half	auto	10.450
64	8	128	64	1024	half	auto	9.904
128	8	128	64	1024	half	auto	46.160
256	8	128	64	1024	half	auto	201.855
512	8	128	64	1024	half	auto	424.268
1024	8	128	64	1024	half	auto	846.070
2048	8	128	64	1024	half	auto	1697.350

After:

Test	num_tokens	num_heads	head_size	block_size	num_blocks	vLLM	IPEX
0	2.0	12.0	128.0	64.0	1024.0	21.138	14.040
1	2.0	32.0	128.0	64.0	1024.0	16.172	16.380
2	2.0	40.0	128.0	64.0	1024.0	17.472	17.628
3	2.0	48.0	128.0	64.0	1024.0	18.564	18.876
4	2.0	64.0	128.0	64.0	1024.0	20.228	20.384
5	2.0	96.0	128.0	64.0	1024.0	24.076	24.232
6	2.0	128.0	128.0	64.0	1024.0	27.768	27.768
7	4.0	12.0	128.0	64.0	1024.0	15.496	15.444
8	4.0	32.0	128.0	64.0	1024.0	111.306	433.784
9	4.0	40.0	128.0	64.0	1024.0	21.060	21.164
10	4.0	48.0	128.0	64.0	1024.0	180.726	369.460
11	4.0	64.0	128.0	64.0	1024.0	26.052	25.792
12	4.0	96.0	128.0	64.0	1024.0	32.448	32.032
13	4.0	128.0	128.0	64.0	1024.0	203.632	132.704
14	8.0	12.0	128.0	64.0	1024.0	19.422	17.680
15	8.0	32.0	128.0	64.0	1024.0	266.292	95.966
16	8.0	40.0	128.0	64.0	1024.0	28.912	30.160
17	8.0	48.0	128.0	64.0	1024.0	31.980	31.824
18	8.0	64.0	128.0	64.0	1024.0	39.416	39.624
19	8.0	96.0	128.0	64.0	1024.0	245.960	55.770
20	8.0	128.0	128.0	64.0	1024.0	271.414	77.012
21	16.0	12.0	128.0	64.0	1024.0	275.184	22.568
22	16.0	32.0	128.0	64.0	1024.0	39.468	39.832
23	16.0	40.0	128.0	64.0	1024.0	291.512	247.130
24	16.0	48.0	128.0	64.0	1024.0	55.744	284.388
25	16.0	64.0	128.0	64.0	1024.0	201.708	335.400
26	16.0	96.0	128.0	64.0	1024.0	131.664	255.008
27	16.0	128.0	128.0	64.0	1024.0	186.784	239.616
28	32.0	12.0	128.0	64.0	1024.0	32.422	31.460
29	32.0	32.0	128.0	64.0	1024.0	215.878	77.012
30	32.0	40.0	128.0	64.0	1024.0	277.056	104.884
31	32.0	48.0	128.0	64.0	1024.0	336.856	290.316
32	32.0	64.0	128.0	64.0	1024.0	189.020	262.002
33	32.0	96.0	128.0	64.0	1024.0	302.744	301.860
34	32.0	128.0	128.0	64.0	1024.0	419.068	398.294
35	64.0	12.0	128.0	64.0	1024.0	259.298	277.004
36	64.0	32.0	128.0	64.0	1024.0	187.044	282.672
37	64.0	40.0	128.0	64.0	1024.0	247.156	246.168
38	64.0	48.0	128.0	64.0	1024.0	295.412	305.396
39	64.0	64.0	128.0	64.0	1024.0	422.760	423.124
40	64.0	96.0	128.0	64.0	1024.0	656.396	657.124
41	64.0	128.0	128.0	64.0	1024.0	892.762	873.288
42	128.0	12.0	128.0	64.0	1024.0	274.092	125.320
43	128.0	32.0	128.0	64.0	1024.0	413.374	410.644
44	128.0	40.0	128.0	64.0	1024.0	532.116	535.002
45	128.0	48.0	128.0	64.0	1024.0	646.958	644.930
46	128.0	64.0	128.0	64.0	1024.0	874.978	884.780
47	128.0	96.0	128.0	64.0	1024.0	1360.424	1329.224
48	128.0	128.0	128.0	64.0	1024.0	1821.014	1834.534
49	256.0	12.0	128.0	64.0	1024.0	285.584	291.200
50	256.0	32.0	128.0	64.0	1024.0	870.896	874.640
51	256.0	40.0	128.0	64.0	1024.0	1111.812	1108.068
52	256.0	48.0	128.0	64.0	1024.0	1327.118	1345.344
53	256.0	64.0	128.0	64.0	1024.0	1805.622	1809.470
54	256.0	96.0	128.0	64.0	1024.0	2778.828	2763.592
55	256.0	128.0	128.0	64.0	1024.0	3720.834	3713.008
56	512.0	12.0	128.0	64.0	1024.0	613.288	624.312
57	512.0	32.0	128.0	64.0	1024.0	1780.870	1781.468
58	512.0	40.0	128.0	64.0	1024.0	2263.430	2242.006
59	512.0	48.0	128.0	64.0	1024.0	2735.954	2743.624
60	512.0	64.0	128.0	64.0	1024.0	3687.424	3672.994
61	512.0	96.0	128.0	64.0	1024.0	5611.918	5614.388
62	512.0	128.0	128.0	64.0	1024.0	7564.310	7538.284
63	1024.0	12.0	128.0	64.0	1024.0	1282.112	1286.766
64	1024.0	32.0	128.0	64.0	1024.0	3596.528	3574.220
65	1024.0	40.0	128.0	64.0	1024.0	4573.244	4568.980
66	1024.0	48.0	128.0	64.0	1024.0	5542.264	5533.216
67	1024.0	64.0	128.0	64.0	1024.0	7370.090	7438.262
68	1024.0	96.0	128.0	64.0	1024.0	11320.296	11353.836
69	1024.0	128.0	128.0	64.0	1024.0	15117.284	15136.446
70	2048.0	12.0	128.0	64.0	1024.0	2607.332	2591.680
71	2048.0	32.0	128.0	64.0	1024.0	7277.816	7262.684
72	2048.0	40.0	128.0	64.0	1024.0	9194.432	9175.192
73	2048.0	48.0	128.0	64.0	1024.0	11150.646	11144.952
74	2048.0	64.0	128.0	64.0	1024.0	14891.396	14904.058
75	2048.0	96.0	128.0	64.0	1024.0	22627.514	22739.002
76	2048.0	128.0	128.0	64.0	1024.0	30439.552	30527.848

jikunshang · 2025-09-19T05:30:16Z

cc @zufangzhu

jikunshang · 2025-09-19T05:31:06Z

if use --model-name "deepseek-ai/DeepSeek-R1-Distill-Qwen-14B", why num_heads still varies? due to tp?

jikunshang · 2025-09-19T05:35:07Z

benchmark/benchmark_rmsnorm.py

@@ -1,411 +1,82 @@
 # SPDX-License-Identifier: Apache-2.0


I copied this file from vllm, I think torch.compile version is worthy to keep.

Signed-off-by: diwei sun <[email protected]>

DiweiSun · 2025-09-19T05:45:30Z

if use --model-name "deepseek-ai/DeepSeek-R1-Distill-Qwen-14B", why num_heads still varies? due to tp?

I followed Wenjun's PR to prepare the configuration, where the model-specific settings are appended directly to the default configuration. This approach is currently pending further discussion.

Signed-off-by: diwei sun <[email protected]>

DiweiSun force-pushed the kernel_benchmark/reshape_and_cache branch from 4738963 to e29841b Compare September 19, 2025 05:11

signoff: refine test_utils to allow all benchmark script utilizes it

e33f1b8

Signed-off-by: diwei sun <[email protected]>

DiweiSun force-pushed the kernel_benchmark/reshape_and_cache branch from e29841b to cfe8f7a Compare September 19, 2025 05:16

jikunshang reviewed Sep 19, 2025

View reviewed changes

refine model config extract

ccd7f37

Signed-off-by: diwei sun <[email protected]>

DiweiSun force-pushed the kernel_benchmark/reshape_and_cache branch from cfe8f7a to ccd7f37 Compare September 19, 2025 05:41

DiweiSun added 3 commits September 18, 2025 22:59

fix for rmsnorm

92cd917

Signed-off-by: diwei sun <[email protected]>

format fix

1baae13

refine utils

91e6472

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Kernel benchmark] enable ipex path for reshape and cache #41

[Kernel benchmark] enable ipex path for reshape and cache #41

Uh oh!

DiweiSun commented Sep 19, 2025

Uh oh!

DiweiSun commented Sep 19, 2025 •

edited

Loading

Uh oh!

jikunshang commented Sep 19, 2025

Uh oh!

jikunshang commented Sep 19, 2025

Uh oh!

jikunshang Sep 19, 2025

Uh oh!

DiweiSun Sep 19, 2025

Uh oh!

DiweiSun commented Sep 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[Kernel benchmark] enable ipex path for reshape and cache #41

Are you sure you want to change the base?

[Kernel benchmark] enable ipex path for reshape and cache #41

Uh oh!

Conversation

DiweiSun commented Sep 19, 2025

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

DiweiSun commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jikunshang commented Sep 19, 2025

Uh oh!

jikunshang commented Sep 19, 2025

Uh oh!

jikunshang Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

DiweiSun Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

DiweiSun commented Sep 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DiweiSun commented Sep 19, 2025 •

edited

Loading