Skip to content

Support Gated MLP #8

@Insideyyy

Description

@Insideyyy

Hello! FlashDMoE is a very great piece of work!
As Gated MLP is widely used as FFN in many LLMs (like DeepSeekV3, Qwen3, Llama), is there a plan to support it?

Gated MLP based on swiGLU:

$$ \begin{split}Swish(x) = x \otimes Sigmoid(x) \\ swiGLU(src,W_1,W_2) = (src \cdot W_1) \otimes Swish(src \cdot W_2) \\ FFN(src,W_1,W_2,V) = swiGLU(src,W_1,W_2) \cdot V\end{split} $$

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions