This week's paper Training language models to follow instructions with human feedback.
Learning to summarize from human feedback is an earlier OpenAI RLHF paper which has a good video summary.
Further Reading:
This week's paper Training language models to follow instructions with human feedback.
Learning to summarize from human feedback is an earlier OpenAI RLHF paper which has a good video summary.
Further Reading: