Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

能否支持qwen、gemma、mistral这些主流模型? #6

Open
ImmoCat-Git opened this issue Mar 20, 2025 · 1 comment
Open

能否支持qwen、gemma、mistral这些主流模型? #6

ImmoCat-Git opened this issue Mar 20, 2025 · 1 comment

Comments

@ImmoCat-Git
Copy link

另外,能否支持魔搭社区的ms-swift框架?
还有,我在尝试使用llamafactory上的apollo对qwen2.5-1.5b和3b 单卡微调时,发现显存占满,且十分缓慢。我的设备是1x3080ti 12g和1xtestla t10 16g。
而llamafactory的默认设置是{"optim": "adamw_torch"},无法使用{"optim": "apollo_adamw"}。
能否对llamafactory提供更多支持?主要是README.md上的脚本使用说明能否更详细些,不知道怎么修改和调用apollo来进行q-apollo-mini单卡和多卡微调。
我希望能使用q-apollo-mini对qwen3b进行微调,感谢!

@zhuhanqing
Copy link
Owner

zhuhanqing commented Mar 30, 2025

Hi, Apollo can be used for training different models. How to use APOLLO in llamafactory using can be found in example . You cannot directly use {"optim": "apollo_adamw"}, but you can set the arg to use apollo.

You can also try APOLLO using HF Transformers following the doc.

For ms-swift, we would like to consider integrating it when I have more bandwidth or their maintainer shows interest in working together. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants