Skip to content

Cccei000/FeedbackDistillation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

统一训练框架

  • fengshenbang-lm为基础,本工程用于启动各SFT模型的RLHF训练
  • 详细使用方法见WIKI

支持的SFT模型

  • GPT-Neox-6B
  • Llama-7B
  • ChatCLM-6B
  • Llama-13B
  • Llama2-13B
  • Baichuan2-13B

支持的RM模型

  • Transformer-XL-5B
  • Llama-7B
  • Llama-13B
  • Llama2-13B

Features

训练偏好学习流程

Token-level RM Sample-level RM Token-mix-sample RM w/o RM
Token-level PPO
Step-level PPO
Sample-level PPO
EDPO

流程中支持的生成方式

  • Vanila Generation ++
  • Best-of-N Generation
  • Token-level bfs Generation
  • Pipeline Generation

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published