Optionally save value model + GRPO #523

hamishivi · 2025-01-17T00:50:22Z

I want to be able to look at the value models we learn and see what happens, so this is to add that :)
Also adds GRPO ray code.

Need to test all this out but is exciting.

vwxyzjn

LGTM. Feel free to merge whenever.

vwxyzjn · 2025-01-23T00:04:45Z

open_instruct/grpo_vllm_thread_ray_gtrl.py

-                #     print(f"{logprobs[0][:40]=}, {ref_logprobs[0][:40]=}, {kl.sum(1)=}")
-                non_score_reward = -args.beta * kl
-                non_score_reward_sum = non_score_reward.sum(1)
-                rlhf_reward = scores + non_score_reward_sum


Let's still log the rlhf_reward

first stab at value model save

2132fd5

vwxyzjn approved these changes Jan 21, 2025

View reviewed changes

hamishivi added 5 commits January 21, 2025 21:44

add grpo

17fe44f

fix

d9762da

fix

947f002

fix

62f47d3

fix

c419614

hamishivi marked this pull request as ready for review January 22, 2025 17:28

hamishivi changed the title ~~Optionally save value model~~ Optionally save value model + GRPO Jan 22, 2025

hamishivi added 3 commits January 22, 2025 09:31

lint

2b3d7ce

fix kl calc

5929c96

lint

0591bf8

hamishivi requested review from vwxyzjn and natolambert January 22, 2025 23:53

vwxyzjn reviewed Jan 23, 2025

View reviewed changes

Address comments

b8abcad

hamishivi merged commit 6485393 into main Jan 23, 2025
3 checks passed

natolambert mentioned this pull request Jan 23, 2025

GRPO questions huggingface/trl#2608

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optionally save value model + GRPO #523

Optionally save value model + GRPO #523

hamishivi commented Jan 17, 2025 •

edited

Loading

vwxyzjn left a comment

vwxyzjn Jan 23, 2025

Optionally save value model + GRPO #523

Optionally save value model + GRPO #523

Conversation

hamishivi commented Jan 17, 2025 • edited Loading

vwxyzjn left a comment

Choose a reason for hiding this comment

vwxyzjn Jan 23, 2025

Choose a reason for hiding this comment

hamishivi commented Jan 17, 2025 •

edited

Loading