We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
throughput_hook.py 中仅仅计算一个micro batch size的tgs,能否添加一个global batch size 的 tgs输出 当accumulative_counts > 1时,在最后一个梯度累计iter,由于比其他iter多一个optim.step()操作,因此直接通过micro batch size输出的tgs求均值,会导致比实际tgs大,尤其在optim offload并且accumulative_counts较小时 例如考虑单机单卡情况,accumulative_counts为2时,假设batch size为1,sequence_len为s,第一个iter tgs为 $\frac{s}{t_1}$,第二个iter tgs为 $\frac{s}{t_2}$ ,如果直接计算两个iter tgs均值,那么gbs tgs为 $\frac{(\frac{s}{t_1} + \frac{s}{t_2})}{2} = \frac{s (t_1+ t_2)}{2t_1t_2}$ 但是实际gbs tgs计算应为 $\frac{2s}{t_1 + t_2}$ 两者相除为 $\frac{(t_1+t_2)^2}{4t_1t_2} \geqslant 1$
micro batch size
tgs
global batch size
accumulative_counts > 1
iter
optim.step()
optim offload
accumulative_counts
2
batch size
1
sequence_len
s
iter tgs
gbs tgs
当sequence_len固定时,通过在throughput_hook.py 中添加如下代码计算global batch size tgs
global batch size tgs
if (batch_idx+1) % runner.strategy.config['gradient_accumulation_steps'] == 0: message_hub.update_scalar('train/gbs_tokens_per_sec', batch_size * sequence_len / ( message_hub.get_scalar('train/time').mean(runner.strategy.config['gradient_accumulation_steps']) + 1e-12))
The text was updated successfully, but these errors were encountered:
realted issue: #967 (comment)
Sorry, something went wrong.
No branches or pull requests
throughput_hook.py 中仅仅计算一个$\frac{s}{t_1}$ ,第二个$\frac{s}{t_2}$ ,如果直接计算两个$\frac{(\frac{s}{t_1} + \frac{s}{t_2})}{2} = \frac{s (t_1+ t_2)}{2t_1t_2}$ $\frac{2s}{t_1 + t_2}$ $\frac{(t_1+t_2)^2}{4t_1t_2} \geqslant 1$
micro batch size
的tgs
,能否添加一个global batch size
的tgs
输出当
accumulative_counts > 1
时,在最后一个梯度累计iter
,由于比其他iter
多一个optim.step()
操作,因此直接通过micro batch size
输出的tgs
求均值,会导致比实际tgs
大,尤其在optim offload
并且accumulative_counts
较小时例如考虑单机单卡情况,
accumulative_counts
为2
时,假设batch size
为1
,sequence_len
为s
,第一个iter tgs
为iter tgs
为iter tgs
均值,那么gbs tgs
为但是实际
gbs tgs
计算应为两者相除为
当
sequence_len
固定时,通过在throughput_hook.py 中添加如下代码计算global batch size tgs
The text was updated successfully, but these errors were encountered: