You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi Author,
Thanks for the release. I would like a question as title.
I have trained the model in A100, now I hope to use FlashMLA to speed up inference in H1200. Is it possible? If yes, do I need to do some changes? Any suggestion would be appreciated, thanks
The text was updated successfully, but these errors were encountered:
Model's structure is nothing to do with arch/platform.
Train and inference could be splited.
You could, as long as the model architecture is compatible with mla.
That's what I really want to know.
If I defined a model based on transformer during training and want to use MLA during inference. What does the transformer look like? A simple code is enough for illustration if possible.
That's what I really want to know. If I defined a model based on transformer during training and want to use MLA during inference. What does the transformer look like? A simple code is enough for illustration if possible.
Thank you so much!
You could refer to modeling_deepseek.py in huggingface of Deepseek Models. That script use if to check if is in training or just for inference.
Hi Author,
Thanks for the release. I would like a question as title.
I have trained the model in A100, now I hope to use FlashMLA to speed up inference in H1200. Is it possible? If yes, do I need to do some changes? Any suggestion would be appreciated, thanks
The text was updated successfully, but these errors were encountered: