Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPO多卡训练报错 'RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:3!' #2830

Open
Vnbxu opened this issue Jan 1, 2025 · 0 comments

Comments

@Vnbxu
Copy link

Vnbxu commented Jan 1, 2025

使用PPO + MP进行微调时会有报错,请问可以帮忙看看吗?尝试了qwen2_5-7b-instruct和llama3-8b-instruct均有这个问题,DPO则没这个问题。

使用脚本如下:

CUDA_VISIBLE_DEVICES=0,1,2,3 \
swift rlhf \
    --rlhf_type ppo \
    --model_type  llama3-8b-instruct \
    --sft_type  lora \
    --dataset hh-rlhf-cn-harmless-base-cn \
    --reward_model_id_or_path  /output/llama3-8b-instruct/v2-20241231-154134/checkpoint-4-merged
    --reward_model_type  llama3-8b-instruct \
    --num_train_epochs  2  \
    --lora_target_modules  ALL  \
    --gradient_checkpointing  true  \
    --batch_size  1  \
    --learning_rate  5e-5  \
    --gradient_accumulation_steps  16  \
    --warmup_ratio  0.03  \
    --save_total_limit  2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant