Skip to content

Commit d377dd4

Browse files
levendleefacebook-github-bot
authored andcommitted
Update ReadMe. (#4126)
Summary: Pull Request resolved: #4126 X-link: facebookresearch/FBGEMM#1207 Reviewed By: Alkaid-Benetnash Differential Revision: D74772796 fbshipit-source-id: c8f62e276fe7107aa182cec602bc94d5d932b5a9
1 parent 5deef6d commit d377dd4

File tree

2 files changed

+3
-3
lines changed

2 files changed

+3
-3
lines changed

fbgemm_gpu/experimental/gen_ai/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ Besides FP8/INT4 support, FBGEMM GenAI operators also support:
1919
* GQA: optimized specifically for decoding cases, as detailed in PyTorch's blog on [INT4 decoding](https://pytorch.org/blog/int4-decoding/).
2020
* KV cache quantizations.
2121
* Rotary Positional Embedding (RoPE).
22-
* MoE [token shuffling](gen_ai/moe/README.md) operators.
22+
* [MetaShuffling](gen_ai/moe/README.md) MoE operators and examples.
2323

2424
## **1.1 FP8 core API functions**
2525

@@ -59,7 +59,7 @@ pip install fbgemm-gpu-genai
5959

6060
## 2.2 **Llama4 MoE support**
6161

62-
More coming soon in [MetaShuffling](gen_ai/moe/README.md) kernels.
62+
* [MetaShuffling](gen_ai/moe/README.md) MoE operators and examples.
6363

6464
# 3. **Llama 3 Related External Coverage**
6565

fbgemm_gpu/experimental/gen_ai/gen_ai/moe/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ MetaShuffling MoE kernel support in FBGEMM GenAI kernel library.
66

77
Mixture-of-Experts (MoE) is a popular model architecture for large language models (LLMs). Although it reduces computation in training and inference by activating less parameters per token, it imposes additional challenges in achieving optimal computation efficiency with high memory and communication pressure, as well as the complexity to handle the dynamism and sparsity nature of the model. Here we introduce a new MoE inference solution, MetaShuffling, which enables us to efficiently deploy Llama 4 models for real scenario inference.
88

9-
More technical design will be coming soon.
9+
[Technical design blog](https://pytorch.org/blog/metashuffling-accelerating-llama-4-moe-inference/).
1010

1111
# **Updates**
1212

0 commit comments

Comments
 (0)