Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

max_grad_norm不生效的问题 #304

Open
yiyepiaoling0715 opened this issue Nov 11, 2024 · 1 comment
Open

max_grad_norm不生效的问题 #304

yiyepiaoling0715 opened this issue Nov 11, 2024 · 1 comment

Comments

@yiyepiaoling0715
Copy link

使用firefly 进行 sft ,grad_norm 始终>1
deepseed config gradient_clip 设置auto
image
1
2
max_grad_norm=1.0
max_grad_norm=1.0
3
4
使用Firefly 进行预训练,同样的deepseed配置,这样是ok的生效的,但就是sft的grad_norm不生效
pretrain的grad_norm记录
5

@yiyepiaoling0715
Copy link
Author

image
deepspeed,transformer版本

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant