Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Misc] ThroughputHook #977

Open
flyinghu123 opened this issue Dec 19, 2024 · 1 comment
Open

[Misc] ThroughputHook #977

flyinghu123 opened this issue Dec 19, 2024 · 1 comment

Comments

@flyinghu123
Copy link

flyinghu123 commented Dec 19, 2024

throughput_hook.py 中仅仅计算一个micro batch sizetgs,能否添加一个global batch sizetgs输出
accumulative_counts > 1时,在最后一个梯度累计iter,由于比其他iter多一个optim.step()操作,因此直接通过micro batch size输出的tgs求均值,会导致比实际tgs大,尤其在optim offload并且accumulative_counts较小时
例如考虑单机单卡情况,accumulative_counts2时,假设batch size1sequence_lens,第一个iter tgs$\frac{s}{t_1}$,第二个iter tgs$\frac{s}{t_2}$ ,如果直接计算两个iter tgs均值,那么gbs tgs$\frac{(\frac{s}{t_1} + \frac{s}{t_2})}{2} = \frac{s (t_1+ t_2)}{2t_1t_2}$
但是实际gbs tgs计算应为 $\frac{2s}{t_1 + t_2}$
两者相除为 $\frac{(t_1+t_2)^2}{4t_1t_2} \geqslant 1$

sequence_len固定时,通过在throughput_hook.py 中添加如下代码计算global batch size tgs

if (batch_idx+1) % runner.strategy.config['gradient_accumulation_steps'] == 0:
            message_hub.update_scalar('train/gbs_tokens_per_sec',
                                    batch_size * sequence_len / (
                                        message_hub.get_scalar('train/time').mean(runner.strategy.config['gradient_accumulation_steps']) + 1e-12))
@CokeDong
Copy link

realted issue: #967 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants