L1 Paper Results & Code Version discrapencies

Thank you for sharing this repository!

I'm trying to reproduce the results from "L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning".
I'm curious if the reported results were achieved using the code before or after the "Refactor to Latest Versions" commit on May 5th?

I'm asking because when I fine-tuned the L1-Qwen-1.5B-Exact model to L1-Qwen-1.5B-Max using 'run_l1_max.sh', I wasn't able to match the performance described in the paper.
Would that refactor introduce a significant difference in the fine-tuning process?

Could you also share the specific package versions used in your environment? I've encountered several bugs during installation when trying to set up the environment without specified library versions.

Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

L1 Paper Results & Code Version discrapencies #22

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

L1 Paper Results & Code Version discrapencies #22

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions