This week's paper is RewardBench: Evaluating Reward Models for Language Modeling.
RewardBench is the first toolkit for benchmarking reward models. In addition to the benchmark, the authors compare scaling, test reasoning capabilities, highlight three buckets of refusal behavior, and share details on the inner workings of RMs.
Further Reading: