Skip to content

L1 Paper Results & Code Version discrapencies #22

@odedsc

Description

@odedsc

Thank you for sharing this repository!

I'm trying to reproduce the results from "L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning".
I'm curious if the reported results were achieved using the code before or after the "Refactor to Latest Versions" commit on May 5th?

I'm asking because when I fine-tuned the L1-Qwen-1.5B-Exact model to L1-Qwen-1.5B-Max using 'run_l1_max.sh', I wasn't able to match the performance described in the paper.
Would that refactor introduce a significant difference in the fine-tuning process?

Could you also share the specific package versions used in your environment? I've encountered several bugs during installation when trying to set up the environment without specified library versions.

Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions