Will using RL further improve performance? #26

FutureForMe · 2025-02-27T12:39:05Z

This paper has been very inspiring to us, thanks. But I noticed that LIMO used 817 data to full-parameter fine-tune Qwen2.5-32B-Instruct. I would like to ask the author if there has been further research on the impact of Reinforcement Learning (RL) on the model’s performance. If so, do you think this approach could further enhance the model’s performance, similar to the improvements seen with DeepSeek-R1?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Will using RL further improve performance? #26

Will using RL further improve performance? #26

FutureForMe commented Feb 27, 2025 •

edited

Loading

Will using RL further improve performance? #26

Will using RL further improve performance? #26

Comments

FutureForMe commented Feb 27, 2025 • edited Loading

FutureForMe commented Feb 27, 2025 •

edited

Loading