PPO多卡训练报错 'RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:3!' #2830

Vnbxu · 2025-01-01T09:00:44Z

使用PPO + MP进行微调时会有报错，请问可以帮忙看看吗？尝试了qwen2_5-7b-instruct和llama3-8b-instruct均有这个问题，DPO则没这个问题。

使用脚本如下：

CUDA_VISIBLE_DEVICES=0,1,2,3 \
swift rlhf \
    --rlhf_type ppo \
    --model_type  llama3-8b-instruct \
    --sft_type  lora \
    --dataset hh-rlhf-cn-harmless-base-cn \
    --reward_model_id_or_path  /output/llama3-8b-instruct/v2-20241231-154134/checkpoint-4-merged
    --reward_model_type  llama3-8b-instruct \
    --num_train_epochs  2  \
    --lora_target_modules  ALL  \
    --gradient_checkpointing  true  \
    --batch_size  1  \
    --learning_rate  5e-5  \
    --gradient_accumulation_steps  16  \
    --warmup_ratio  0.03  \
    --save_total_limit  2

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPO多卡训练报错 'RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:3!' #2830

PPO多卡训练报错 'RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:3!' #2830

Vnbxu commented Jan 1, 2025

PPO多卡训练报错 'RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:3!' #2830

PPO多卡训练报错 'RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:3!' #2830

Comments

Vnbxu commented Jan 1, 2025