You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This paper has been very inspiring to us, thanks. But I noticed that LIMO used 817 data to full-parameter fine-tune Qwen2.5-32B-Instruct. I would like to ask the author if there has been further research on the impact of Reinforcement Learning (RL) on the model’s performance. If so, do you think this approach could further enhance the model’s performance, similar to the improvements seen with DeepSeek-R1?
The text was updated successfully, but these errors were encountered:
This paper has been very inspiring to us, thanks. But I noticed that LIMO used 817 data to full-parameter fine-tune Qwen2.5-32B-Instruct. I would like to ask the author if there has been further research on the impact of Reinforcement Learning (RL) on the model’s performance. If so, do you think this approach could further enhance the model’s performance, similar to the improvements seen with DeepSeek-R1?
The text was updated successfully, but these errors were encountered: