-
Notifications
You must be signed in to change notification settings - Fork 91
Open
Description
Hello, thank you for your work. I'd like to ask some questions about expert choice routing:
- While looking for code implementations of this part, I found that Google's implementation mentions in a comment that expert choice routing is not suitable for decoder-only architectures (https://github.com/google/flaxformer/blob/main/flaxformer/architectures/moe/routing.py#L655). I'm curious about how you handled this issue in your experiments.
- I'd like to know how you perform expert choice routing calculations during inference. With kv-cache, the ffn layer typically needs to compute only one token at a time. Or, does the expert choice routing model not utilize kv-cache during inference?
Metadata
Metadata
Assignees
Labels
No labels