reasoning length = thinking length +  solution length?

Thank you for your great work on the L1 paper. I have one question I'd like to clarify. In the paper, it’s described that LCPO allows control over the reasoning chain length. However, when I look into the code, it seems that the length being controlled is actually the total output length — the thinking length +  solution length.

Could you please clarify whether LCPO is controlling just the reasoning chain length, or the entire output length (reasoning + solution)? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reasoning length = thinking length + solution length? #23

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

reasoning length = thinking length + solution length? #23

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions