Constitutional AI: Harmlessness from AI Feedback Constitutional AI: Harmlessness from AI Feedback, one of the first RLAIF papers from Anthropic. Further Reading: RAIN: Your Language Models Can Align Themselves without Finetuning RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback Self-Rewarding Language Models Suppressing Pink Elephants with Direct Principle Feedback Reinforcement Learning from Reflective Feedback (RLRF): Aligning and Improving LLMs via Fine-Grained Self-Reflection Claude’s Constitution