- 以fengshenbang-lm为基础,本工程用于启动各SFT模型的RLHF训练
- 详细使用方法见WIKI
- GPT-Neox-6B
- Llama-7B
- ChatCLM-6B
- Llama-13B
- Llama2-13B
- Baichuan2-13B
- Transformer-XL-5B
- Llama-7B
- Llama-13B
- Llama2-13B
Token-level RM | Sample-level RM | Token-mix-sample RM | w/o RM | |
---|---|---|---|---|
Token-level PPO | ✅ | ✅ | ✅ | ❌ |
Step-level PPO | ✅ | ✅ | ✅ | ❌ |
Sample-level PPO | ✅ | ✅ | ✅ | ❌ |
EDPO | ❌ | ❌ | ❌ | ✅ |
- Vanila Generation ++
- Best-of-N Generation
- Token-level bfs Generation
- Pipeline Generation