Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

相同设置的peft与ms-swift的DPO训练速度差异较大(600s 1iter vs 40s 1 iter) #2815

Open
maoulee opened this issue Dec 31, 2024 · 4 comments

Comments

@maoulee
Copy link

maoulee commented Dec 31, 2024

swift_cli:
USE_HF=1
CUDA_VISIBLE_DEVICES=0,1
swift rlhf
--rlhf_type dpo
--model_type qwen2_5
--model /root/.cache/modelscope/hub/unsloth/Qwen2___5-32B-Instruct-bnb-4bit/
--train_type lora
--tuner_backend peft
--dataset llamafactory/ultrafeedback_binarized#2000
--num_train_epochs 2
--learning_rate 5e-6
--lora_rank 8
--lora_alpha 32
--gradient_accumulation_steps 16
--gradient_checkpointing_kwargs '{"use_reentrant": false}'
--eval_steps 100
--save_steps 100
--save_total_limit 2
--lora_dropout 0.05
--logging_steps 100
--quant_method bnb
--quant_bit 4
--max_new_tokens 1500 \

微调速度:
Train: 7%|█████▏ | 17/246 [10:46<2:15:35, 35.53s/it]
Train: 28%|█████████████████████ | 69/246 [41:48<1:47:15, 36.36s/it

peft设置:
train_dataset = load_dataset("llamafactory/ultrafeedback_binarized", split="train")
train_dataset = train_dataset.shuffle(seed=42)
train_dataset = train_dataset.select(range(2000))
train_dataset=train_dataset.map(map_instruction)

test_dataset=load_dataset("llamafactory/ultrafeedback_binarized", split="test")
test_dataset = test_dataset.shuffle(seed=42)
test_dataset = test_dataset.select(range(200))
test_dataset=test_dataset.map(map_instruction)

bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",

)
model = AutoModelForCausalLM.from_pretrained(
pretrained_model_name_or_path="/root/.cache/modelscope/hub/unsloth/Qwen2___5-32B-Instruct-bnb-4bit/",
quantization_config=bnb_config,
attn_implementation="flash_attention_2",
torch_dtype=torch.bfloat16,
use_cache= False,
device_map="auto",
)
model=prepare_model_for_kbit_training(model)
model.gradient_checkpointing_enable()
peft_config = LoraConfig(
r=4,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"],
lora_alpha=32,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM"
)
tokenizer=AutoTokenizer.from_pretrained("/root/.cache/modelscope/hub/unsloth/Qwen2___5-32B-Instruct-bnb-4bit/")
EOS_TOKEN = tokenizer.eos_token

# Tokenizer settings
if tokenizer.chat_template is None:
    tokenizer.chat_template = SIMPLE_CHAT_TEMPLATE
if tokenizer.pad_token_id is None:
    tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"


training_args = DPOConfig(
    output_dir="/llm/checkpoint/",  # Output directory for checkpoints and final model
    per_device_train_batch_size=2,  # Batch size per device during training
    gradient_accumulation_steps=16,  # Number of gradient accumulation steps
    num_train_epochs=4,  # Total number of training epochs
    learning_rate=5e-7,  # Learning rate
    logging_dir="./logs",  # Directory for storing logs
    logging_steps=500,  # Log every X updates steps
    save_steps=500,  # Save checkpoint every X updates steps
    eval_strategy="no",  # Evaluation is done (and logged) every `eval_steps`
    beta=0.1,  # The beta parameter for DPO loss
    loss_type="hinge",
    optim = "adamw_8bit",
    max_length=2048,
    max_prompt_length=500
)
model.enable_input_require_grads()
trainer=DPOTrainer(
    model=model,
    args=training_args,
    peft_config=peft_config,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    tokenizer=tokenizer
)


# Configure generation for evaluation
if training_args.eval_strategy != "no":
    generation_config = GenerationConfig(
        max_new_tokens=2048,
        do_sample=True,
        temperature=1.0
    )
    completions_callback = LogCompletionsCallback(trainer, generation_config, num_prompts=8)
    trainer.add_callback(completions_callback)

# Train the model
trainer.train()

微调速度
Could not estimate the number of tokens of the input, floating-point operations will not be computed
0%|▎ | 1/248 [10:40<43:57:17, 640.64s/it]

两者较大的速度差异究竟是什么引起的,swift能添加对于online_dpo的支持吗

@Jintao-Huang
Copy link
Collaborator

optim的关系吧

@maoulee
Copy link
Author

maoulee commented Dec 31, 2024

optim的关系吧

那我该如何导出swift所用的peft微调时的配置?文档中应参考那一部分?

@Jintao-Huang
Copy link
Collaborator

会存args.json,训练的配置文件

@maoulee
Copy link
Author

maoulee commented Jan 2, 2025

会存args.json,训练的配置文件

谢谢告知!我后续基于args,尝试将几个关键地方更改为与swift一致(optime与loss func,),发现速度仍旧存在巨大差异,peft 1 iter/180s switf: 1 iter/39s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants