Skip to content

Latest commit

 

History

History
10 lines (7 loc) · 745 Bytes

RewardBench_Evaluating_Reward_Models_for_Language_Modeling.md

File metadata and controls

10 lines (7 loc) · 745 Bytes

RewardBench: Evaluating Reward Models for Language Modeling

This week's paper is RewardBench: Evaluating Reward Models for Language Modeling.

RewardBench is the first toolkit for benchmarking reward models. In addition to the benchmark, the authors compare scaling, test reasoning capabilities, highlight three buckets of refusal behavior, and share details on the inner workings of RMs.

Further Reading: