-
Notifications
You must be signed in to change notification settings - Fork 29
Description
Thank you for sharing this repository!
I'm trying to reproduce the results from "L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning".
I'm curious if the reported results were achieved using the code before or after the "Refactor to Latest Versions" commit on May 5th?
I'm asking because when I fine-tuned the L1-Qwen-1.5B-Exact model to L1-Qwen-1.5B-Max using 'run_l1_max.sh', I wasn't able to match the performance described in the paper.
Would that refactor introduce a significant difference in the fine-tuning process?
Could you also share the specific package versions used in your environment? I've encountered several bugs during installation when trying to set up the environment without specified library versions.
Thanks.