Skip to content

关于 LitePPO 的相关配置 #308

@annn521

Description

@annn521

https://alibaba.github.io/ROLL/docs/User%20Guides/Algorithms/LitePPO 中提到 LitePPO 的相关配置,其中

adv_estimator: "gae"
num_return_sequences_in_group: 1

LitePPO 采样按组求均值以及按 batch 归一化的方式,所以为什么优势计算方法是 gae 而不是 grpo 呢,以及为什么每个 prompt 仅仅一个 response.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions