We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
使用firefly 进行 sft ,grad_norm 始终>1 deepseed config gradient_clip 设置auto max_grad_norm=1.0 max_grad_norm=1.0 使用Firefly 进行预训练,同样的deepseed配置,这样是ok的生效的,但就是sft的grad_norm不生效 pretrain的grad_norm记录
The text was updated successfully, but these errors were encountered:
deepspeed,transformer版本
Sorry, something went wrong.
No branches or pull requests
使用firefly 进行 sft ,grad_norm 始终>1
deepseed config gradient_clip 设置auto
max_grad_norm=1.0
max_grad_norm=1.0
使用Firefly 进行预训练,同样的deepseed配置,这样是ok的生效的,但就是sft的grad_norm不生效
pretrain的grad_norm记录
The text was updated successfully, but these errors were encountered: