[Dlight][CPU] Add CPU Backend Support for GEMV Optimization #17663

mengshyu · 2025-02-17T22:30:29Z

This PR adds Dlight CPU support with optimized GEMV scheduling, including pattern detection, loop tiling, vectorization, and parallel execution. It improves maintainability by refining target checks, reduction handling, and scheduling logic.

CPU: AMD Ryzen 9 7950X 16-Core Processor
MODEL: Qwen2-0.5B-q4f16_1-MLC
Prompt: What is the meaning of life?

Results:
Baseline:
prompt_tokens=27 completion_tokens=235 total_tokens=262 extra={'prompt_tokens': 27, 'completion_tokens': 235, 'prefill_tokens': 27, 'decode_tokens': 234, 'jump_forward_tokens': 0, 'prefill_tokens_per_s': 0.9777329325367138,
'decode_tokens_per_s': 0.558195154052001,
'end_to_end_latency_s': 446.823128383, 'ttft_s': 27.614902906, 'inter_token_latency_s': 1.9013750143957446}

Optimized:
usage: prompt_tokens=27 completion_tokens=227 total_tokens=254 extra={'prompt_tokens': 27, 'completion_tokens': 227, 'prefill_tokens': 27, 'decode_tokens': 226, 'jump_forward_tokens': 0, 'prefill_tokens_per_s': 1.0010420333327994,
'decode_tokens_per_s': 2.9349053824023454,
'end_to_end_latency_s': 103.976080401, 'ttft_s': 26.971894387, 'inter_token_latency_s': 0.4580444070528635}

This PR adds Dlight CPU support with optimized GEMV scheduling, including pattern detection, loop tiling, vectorization, and parallel execution. It improves maintainability by refining target checks, reduction handling, and scheduling logic. CPU: AMD Ryzen 9 7950X 16-Core Processor MODEL: Qwen2-0.5B-q4f16_1-MLC Prompt: What is the meaning of life? Results: Baseline: prompt_tokens=27 completion_tokens=235 total_tokens=262 extra={'prompt_tokens': 27, 'completion_tokens': 235, 'prefill_tokens': 27, 'decode_tokens': 234, 'jump_forward_tokens': 0, 'prefill_tokens_per_s': 0.9777329325367138, 'decode_tokens_per_s': 0.558195154052001, 'end_to_end_latency_s': 446.823128383, 'ttft_s': 27.614902906, 'inter_token_latency_s': 1.9013750143957446} Optimized: usage: prompt_tokens=27 completion_tokens=227 total_tokens=254 extra={'prompt_tokens': 27, 'completion_tokens': 227, 'prefill_tokens': 27, 'decode_tokens': 226, 'jump_forward_tokens': 0, 'prefill_tokens_per_s': 1.0010420333327994, 'decode_tokens_per_s': 2.9349053824023454, 'end_to_end_latency_s': 103.976080401, 'ttft_s': 26.971894387, 'inter_token_latency_s': 0.4580444070528635}

tqchen · 2025-02-18T12:42:20Z

cc @Hzfengsy can you help to take a look, also cc @tlopex

Hzfengsy · 2025-02-19T02:33:30Z

Also cc @HongHongHongL

Hzfengsy · 2025-02-19T02:34:49Z

python/tvm/dlight/cpu/gemv.py

+    return buffer_store.value.b
+
+
+def is_gemv(sch: tir.Schedule, block_info: BlockInfo) -> Optional[List[tir.Buffer]]:


Can we reuse gpu's util functions?

saying that we can create a folder named something like "analysis" or "utils" under dlight folder, for different backends.

i agree this is a good suggestion, dlight.analysis sounds right

Hi @Hzfengsy, I've created a folder analysis to ensure CPU and GPU backends reuse shared logic for GEMV, could you recheck it, thanks.

Hzfengsy · 2025-02-19T02:35:18Z

python/tvm/dlight/cpu/gemv.py

+    return ret if 0 < len(ret) < len(block_stmt.reads) else None
+
+
+def normalize(  # pylint: disable=too-many-locals, use-a-generator


Maybe we can reuse this one as well

mengshyu force-pushed the 0217-dlightcpu branch from 6bb3b44 to 34b4466 Compare February 18, 2025 03:40

lint

e09b152

mengshyu force-pushed the 0217-dlightcpu branch from 34b4466 to e09b152 Compare February 18, 2025 03:48

tqchen assigned Hzfengsy and MasterJH5574 Feb 18, 2025

Add unit test

3314c61

Hzfengsy reviewed Feb 19, 2025

View reviewed changes

mengshyu added 2 commits February 19, 2025 12:26

Refactor analysis and scheduling utilities

1da944c

lint

33b406b

mengshyu force-pushed the 0217-dlightcpu branch from abc8ad4 to 33b406b Compare February 19, 2025 17:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Dlight][CPU] Add CPU Backend Support for GEMV Optimization #17663

[Dlight][CPU] Add CPU Backend Support for GEMV Optimization #17663

mengshyu commented Feb 17, 2025

tqchen commented Feb 18, 2025

Hzfengsy commented Feb 19, 2025

Hzfengsy Feb 19, 2025

Hzfengsy Feb 19, 2025

tqchen Feb 19, 2025

mengshyu Feb 20, 2025

Hzfengsy Feb 19, 2025

		return buffer_store.value.b


		def is_gemv(sch: tir.Schedule, block_info: BlockInfo) -> Optional[List[tir.Buffer]]:

		return ret if 0 < len(ret) < len(block_stmt.reads) else None


		def normalize( # pylint: disable=too-many-locals, use-a-generator

[Dlight][CPU] Add CPU Backend Support for GEMV Optimization #17663

Are you sure you want to change the base?

[Dlight][CPU] Add CPU Backend Support for GEMV Optimization #17663

Conversation

mengshyu commented Feb 17, 2025

tqchen commented Feb 18, 2025

Hzfengsy commented Feb 19, 2025

Hzfengsy Feb 19, 2025

Choose a reason for hiding this comment

Hzfengsy Feb 19, 2025

Choose a reason for hiding this comment

tqchen Feb 19, 2025

Choose a reason for hiding this comment

mengshyu Feb 20, 2025

Choose a reason for hiding this comment

Hzfengsy Feb 19, 2025

Choose a reason for hiding this comment