Update ReadMe. (#4126)

levendlee · facebook-github-bot · commit d377dd43b1c1 · 2025-05-14T20:08:50.000-07:00
Summary: Pull Request resolved: #4126 X-link: facebookresearch/FBGEMM#1207 Reviewed By: Alkaid-Benetnash Differential Revision: D74772796 fbshipit-source-id: c8f62e276fe7107aa182cec602bc94d5d932b5a9
diff --git a/fbgemm_gpu/experimental/gen_ai/README.md b/fbgemm_gpu/experimental/gen_ai/README.md
@@ -19,7 +19,7 @@ Besides FP8/INT4 support, FBGEMM GenAI operators also support:
 * GQA: optimized specifically for decoding cases, as detailed in PyTorch's blog on [INT4 decoding](https://pytorch.org/blog/int4-decoding/).
 * KV cache quantizations.
 * Rotary Positional Embedding (RoPE).
-* MoE [token shuffling](gen_ai/moe/README.md) operators.
+* [MetaShuffling](gen_ai/moe/README.md) MoE operators and examples.
 
 ## **1.1 FP8 core API functions**
 
@@ -59,7 +59,7 @@ pip install fbgemm-gpu-genai
 
 ## 2.2 **Llama4 MoE support**
 
-More coming soon in [MetaShuffling](gen_ai/moe/README.md) kernels.
+* [MetaShuffling](gen_ai/moe/README.md) MoE operators and examples.
 
 # 3. **Llama 3 Related External Coverage**
 
diff --git a/fbgemm_gpu/experimental/gen_ai/gen_ai/moe/README.md b/fbgemm_gpu/experimental/gen_ai/gen_ai/moe/README.md
@@ -6,7 +6,7 @@ MetaShuffling MoE kernel support in FBGEMM GenAI kernel library.
 
 Mixture-of-Experts (MoE) is a popular model architecture for large language models (LLMs). Although it reduces computation in training and inference by activating less parameters per token,  it imposes additional challenges in achieving optimal computation efficiency with high memory and communication pressure, as well as the complexity to handle the dynamism and sparsity nature of the model. Here we introduce a new MoE inference solution, MetaShuffling, which enables us to efficiently deploy Llama 4 models for real scenario inference.
 
-More technical design will be coming soon.
+[Technical design blog](https://pytorch.org/blog/metashuffling-accelerating-llama-4-moe-inference/).
 
 # **Updates**