[Misc] ThroughputHook #977

flyinghu123 · 2024-12-19T06:48:47Z

throughput_hook.py 中仅仅计算一个micro batch size的tgs，能否添加一个global batch size 的 tgs输出
当accumulative_counts > 1时，在最后一个梯度累计iter，由于比其他iter多一个optim.step()操作，因此直接通过micro batch size输出的tgs求均值，会导致比实际tgs大，尤其在optim offload并且accumulative_counts较小时
例如考虑单机单卡情况，accumulative_counts为2时，假设batch size为1，sequence_len为s，第一个iter tgs为 $\frac{s}{t_1}$，第二个iter tgs为 $\frac{s}{t_2}$ ，如果直接计算两个iter tgs均值，那么gbs tgs为 $\frac{(\frac{s}{t_1} + \frac{s}{t_2})}{2} = \frac{s (t_1+ t_2)}{2t_1t_2}$
但是实际gbs tgs计算应为 $\frac{2s}{t_1 + t_2}$
两者相除为 $\frac{(t_1+t_2)^2}{4t_1t_2} \geqslant 1$

当sequence_len固定时，通过在throughput_hook.py 中添加如下代码计算global batch size tgs

if (batch_idx+1) % runner.strategy.config['gradient_accumulation_steps'] == 0:
            message_hub.update_scalar('train/gbs_tokens_per_sec',
                                    batch_size * sequence_len / (
                                        message_hub.get_scalar('train/time').mean(runner.strategy.config['gradient_accumulation_steps']) + 1e-12))

The text was updated successfully, but these errors were encountered:

CokeDong · 2024-12-19T07:40:19Z

realted issue: #967 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Misc] ThroughputHook #977

[Misc] ThroughputHook #977

flyinghu123 commented Dec 19, 2024 •

edited

Loading

CokeDong commented Dec 19, 2024

[Misc] ThroughputHook #977

[Misc] ThroughputHook #977

Comments

flyinghu123 commented Dec 19, 2024 • edited Loading

CokeDong commented Dec 19, 2024

flyinghu123 commented Dec 19, 2024 •

edited

Loading