Expert Specialization MoE Kernel

Introduction

Inspired by this issue, the load of experts may vary depending on the scenario and is dynamically changing. Furthermore, this distribution is usually uneven, typically, most experts process only a small number of tokens, while a few experts handle a very large number of tokens.

Currently, Grouped GEMM based on CUTLASS or Triton have become the standard solution for MoE modules. Typically, these Grouped GEMM implementations use a single matrix tiling strategy, such as (128, 128, 128). However, a single tiling strategy cannot efficiently handle all scenarios with varying numbers of tokens. When the number of expert tokens is small, using larger matrix tile can lead to unnecessary computations. Conversely, when the number of expert tokens is large, using smaller matrix tile may fail to fully utilize the hardware's power.

Therefore, I implemented a new MoE module solution based on CUTLASS. Create multiple kernels with different matrix tile sizes, and dynamically dispatch tasks to each kernel based on problem sizes. To avoid the overhead of kernel prologue and epilogue, I use PDL features for optimization.

Install

git clone --recursive https://github.com/HydraQYH/expert_specialization_moe.git
cd expert_specialization
python3 setup.py install

Unitest(Accuracy)

pytest -s ./test/test_es_grouped_gemm.py

Benchmark(Performance)

python3 ./benchmark/benchmark_es_fp8_blockwise_moe.py

Result

I haven't yet tuned the performance now. However, preliminary test results on the H20 GPU show that the Expert Specialization MoE Kernel offers significant performance improvements compared to the sgl-kernel(0.3.9.post2), especially when the workload across different expert is unbalanced:

TODO

Currently, I only support one type of Grouped GEMM (SM90 FP8 Blockwise). The two main tasks that follow are:

Support more types of Grouped GEMM.
Tune the performance for different token range.

Contact

If you have any questions or suggestions, or perhaps you want to participate in the development process, please feel free to contact me. Both of these email addresses are valid:

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
benchmark		benchmark
expert_specialization		expert_specialization
media		media
test		test
third_party		third_party
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Expert Specialization MoE Kernel

Introduction

Install

Unitest(Accuracy)

Benchmark(Performance)

Result

TODO

Contact

About

Uh oh!

Releases

Packages

Languages

License

HydraQYH/expert_specialization_moe

Folders and files

Latest commit

History

Repository files navigation

Expert Specialization MoE Kernel

Introduction

Install

Unitest(Accuracy)

Benchmark(Performance)

Result

TODO

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages