Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could the model trained in A100 use FlashMLA in H100 for inference? #40

Open
hdjsjyl opened this issue Feb 25, 2025 · 3 comments
Open

Comments

@hdjsjyl
Copy link

hdjsjyl commented Feb 25, 2025

Hi Author,
Thanks for the release. I would like a question as title.
I have trained the model in A100, now I hope to use FlashMLA to speed up inference in H1200. Is it possible? If yes, do I need to do some changes? Any suggestion would be appreciated, thanks

@foreverlms
Copy link

Model's structure is nothing to do with arch/platform.
Train and inference could be splited.
You could, as long as the model architecture is compatible with mla.

@hdjsjyl
Copy link
Author

hdjsjyl commented Feb 25, 2025

Hi @foreverlms , really appreciate your reply.

That's what I really want to know.
If I defined a model based on transformer during training and want to use MLA during inference. What does the transformer look like? A simple code is enough for illustration if possible.

Thank you so much!

@foreverlms
Copy link

Hi @foreverlms , really appreciate your reply.

That's what I really want to know. If I defined a model based on transformer during training and want to use MLA during inference. What does the transformer look like? A simple code is enough for illustration if possible.

Thank you so much!

You could refer to modeling_deepseek.py in huggingface of Deepseek Models. That script use if to check if is in training or just for inference.

https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite/blob/main/modeling_deepseek.py#L574

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants