You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Tokenizer settings
if tokenizer.chat_template is None:
tokenizer.chat_template = SIMPLE_CHAT_TEMPLATE
if tokenizer.pad_token_id is None:
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"
training_args = DPOConfig(
output_dir="/llm/checkpoint/", # Output directory for checkpoints and final model
per_device_train_batch_size=2, # Batch size per device during training
gradient_accumulation_steps=16, # Number of gradient accumulation steps
num_train_epochs=4, # Total number of training epochs
learning_rate=5e-7, # Learning rate
logging_dir="./logs", # Directory for storing logs
logging_steps=500, # Log every X updates steps
save_steps=500, # Save checkpoint every X updates steps
eval_strategy="no", # Evaluation is done (and logged) every `eval_steps`
beta=0.1, # The beta parameter for DPO loss
loss_type="hinge",
optim = "adamw_8bit",
max_length=2048,
max_prompt_length=500
)
model.enable_input_require_grads()
trainer=DPOTrainer(
model=model,
args=training_args,
peft_config=peft_config,
train_dataset=train_dataset,
eval_dataset=test_dataset,
tokenizer=tokenizer
)
# Configure generation for evaluation
if training_args.eval_strategy != "no":
generation_config = GenerationConfig(
max_new_tokens=2048,
do_sample=True,
temperature=1.0
)
completions_callback = LogCompletionsCallback(trainer, generation_config, num_prompts=8)
trainer.add_callback(completions_callback)
# Train the model
trainer.train()
微调速度
Could not estimate the number of tokens of the input, floating-point operations will not be computed
0%|▎ | 1/248 [10:40<43:57:17, 640.64s/it]
两者较大的速度差异究竟是什么引起的,swift能添加对于online_dpo的支持吗
The text was updated successfully, but these errors were encountered:
swift_cli:
USE_HF=1
CUDA_VISIBLE_DEVICES=0,1
swift rlhf
--rlhf_type dpo
--model_type qwen2_5
--model /root/.cache/modelscope/hub/unsloth/Qwen2___5-32B-Instruct-bnb-4bit/
--train_type lora
--tuner_backend peft
--dataset llamafactory/ultrafeedback_binarized#2000
--num_train_epochs 2
--learning_rate 5e-6
--lora_rank 8
--lora_alpha 32
--gradient_accumulation_steps 16
--gradient_checkpointing_kwargs '{"use_reentrant": false}'
--eval_steps 100
--save_steps 100
--save_total_limit 2
--lora_dropout 0.05
--logging_steps 100
--quant_method bnb
--quant_bit 4
--max_new_tokens 1500 \
微调速度:
Train: 7%|█████▏ | 17/246 [10:46<2:15:35, 35.53s/it]
Train: 28%|█████████████████████ | 69/246 [41:48<1:47:15, 36.36s/it
peft设置:
train_dataset = load_dataset("llamafactory/ultrafeedback_binarized", split="train")
train_dataset = train_dataset.shuffle(seed=42)
train_dataset = train_dataset.select(range(2000))
train_dataset=train_dataset.map(map_instruction)
)
model = AutoModelForCausalLM.from_pretrained(
pretrained_model_name_or_path="/root/.cache/modelscope/hub/unsloth/Qwen2___5-32B-Instruct-bnb-4bit/",
quantization_config=bnb_config,
attn_implementation="flash_attention_2",
torch_dtype=torch.bfloat16,
use_cache= False,
device_map="auto",
)
model=prepare_model_for_kbit_training(model)
model.gradient_checkpointing_enable()
peft_config = LoraConfig(
r=4,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"],
lora_alpha=32,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM"
)
tokenizer=AutoTokenizer.from_pretrained("/root/.cache/modelscope/hub/unsloth/Qwen2___5-32B-Instruct-bnb-4bit/")
EOS_TOKEN = tokenizer.eos_token
微调速度
Could not estimate the number of tokens of the input, floating-point operations will not be computed
0%|▎ | 1/248 [10:40<43:57:17, 640.64s/it]
两者较大的速度差异究竟是什么引起的,swift能添加对于online_dpo的支持吗
The text was updated successfully, but these errors were encountered: