Skip to content

reasoning length = thinking length + solution length? #23

@Jarvis33yu

Description

@Jarvis33yu

Thank you for your great work on the L1 paper. I have one question I'd like to clarify. In the paper, it’s described that LCPO allows control over the reasoning chain length. However, when I look into the code, it seems that the length being controlled is actually the total output length — the thinking length + solution length.

Could you please clarify whether LCPO is controlling just the reasoning chain length, or the entire output length (reasoning + solution)?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions