Efficient Mixture of Experts implementation? #4035

SamKG · 2024-06-26T21:09:34Z

SamKG
Jun 26, 2024

Hello,

Is there an efficient implementation of Mixture of Experts models (e.g. Mixtral) anywhere that uses flax/jax?
I can't seem to find one, though I'd be happy to try writing one myself if there's interest.

As an aside: is block-sparse matmul efficient in Jax/Flax?

Answered by cgarciae

Jun 27, 2024

Checkout the moe implementation from Flaxformers.

View full answer

cgarciae · 2024-06-27T12:12:06Z

cgarciae
Jun 27, 2024
Maintainer

Checkout the moe implementation from Flaxformers.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficient Mixture of Experts implementation? #4035

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Efficient Mixture of Experts implementation? #4035

SamKG Jun 26, 2024

Replies: 1 comment

cgarciae Jun 27, 2024 Maintainer

SamKG
Jun 26, 2024

cgarciae
Jun 27, 2024
Maintainer