We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
使用PPO + MP进行微调时会有报错,请问可以帮忙看看吗?尝试了qwen2_5-7b-instruct和llama3-8b-instruct均有这个问题,DPO则没这个问题。
使用脚本如下:
CUDA_VISIBLE_DEVICES=0,1,2,3 \ swift rlhf \ --rlhf_type ppo \ --model_type llama3-8b-instruct \ --sft_type lora \ --dataset hh-rlhf-cn-harmless-base-cn \ --reward_model_id_or_path /output/llama3-8b-instruct/v2-20241231-154134/checkpoint-4-merged --reward_model_type llama3-8b-instruct \ --num_train_epochs 2 \ --lora_target_modules ALL \ --gradient_checkpointing true \ --batch_size 1 \ --learning_rate 5e-5 \ --gradient_accumulation_steps 16 \ --warmup_ratio 0.03 \ --save_total_limit 2
The text was updated successfully, but these errors were encountered:
No branches or pull requests
使用PPO + MP进行微调时会有报错,请问可以帮忙看看吗?尝试了qwen2_5-7b-instruct和llama3-8b-instruct均有这个问题,DPO则没这个问题。
使用脚本如下:
The text was updated successfully, but these errors were encountered: