-
Hello, Is there an efficient implementation of Mixture of Experts models (e.g. Mixtral) anywhere that uses flax/jax? As an aside: is block-sparse matmul efficient in Jax/Flax? |
Beta Was this translation helpful? Give feedback.
Answered by
cgarciae
Jun 27, 2024
Replies: 1 comment
-
Checkout the moe implementation from Flaxformers. |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
SamKG
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Checkout the moe implementation from Flaxformers.