Skip to content

在评估环节可能存在一些逻辑疏漏 #20

@HanlardResearch

Description

@HanlardResearch

我是运行的./GPG/open-r1/train.sh

训练过程没有问题, 但是评估时(设置 --eval_strategy steps)时,下面这行代码的张量维度无法对齐:

per_token_loss = - per_token_logps * advantages.unsqueeze(1)

我debug后发现,问题来自于_generate_and_score_completions这个函数的下面这段代码(命名为片段-1):

if n_valid_samples < self.args.min_inverse_alpha * num_samples:
logger.info(f"keep generating more examples: the {n_gen}-th mini-batch")
n_gen += 1

else:
# 重新组装样本batch
rewards = merge(identical_rewards, new_rewards)[:len(prompts)]
print(
f"[DEBUG][RANK {self.accelerator.process_index}] lin999 {mode} rewards.shape:{rewards.shape},len(prompts):{len(prompts)}")
prompt_ids = merge_with_padding(identical_prompt_ids, new_prompt_ids, self.processing_class.pad_token_id, left_pad=True)[:len(prompts)]
prompt_mask = merge_with_padding(identical_prompt_mask, new_prompt_mask, 0, left_pad=True)[:len(prompts)]
completion_ids = merge_with_padding(identical_completion_ids, new_completion_ids, self.processing_class.pad_token_id, left_pad=False)[:len(prompts)]
completion_mask = merge_with_padding(identical_completion_mask, new_completion_mask, 0, left_pad=False)[:len(prompts)]
break

在第一个evaluate step时会执行else分支,可以正常运行。 但是在第二个evaluate step时会执行if分支,那么rewards的维度就和第一次不一样了。

例如 我用4卡计算时,超参数如下:

[INFO|trainer.py:2414] 2025-07-01 11:55:15,632 >> ***** Running training *****
[INFO|trainer.py:2417] 2025-07-01 11:55:15,633 >> Instantaneous batch size per device = 16
[INFO|trainer.py:2420] 2025-07-01 11:55:15,633 >> Total train batch size (w. parallel, distributed & accumulation) = 128
[INFO|trainer.py:2421] 2025-07-01 11:55:15,633 >> Gradient Accumulation steps = 2

那么第一个evaluate step执行完代码 片段-1 后, rewards维度是 [16], 但是第二个evaluate step执行完代码 片段-1 后,rewards维度是 [64],
就会导致在计算 per_token_loss = - per_token_logps * advantages.unsqueeze(1) 时候,维度不一致而报错,因为per_token_logps 的第一个维度一直是[16]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions