Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

swift3 internvl2_5 双机16卡,lora 单卡OOM #2760

Closed
UnderElm opened this issue Dec 25, 2024 · 2 comments
Closed

swift3 internvl2_5 双机16卡,lora 单卡OOM #2760

UnderElm opened this issue Dec 25, 2024 · 2 comments

Comments

@UnderElm
Copy link

nvidia-smi 显示 仅少数卡 有显存占用

@UnderElm
Copy link
Author

补充

#!/bin/bash

SIZE_FACTOR=8 MAX_PIXELS=602112 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
NNODES=2
NODE_RANK=0
MASTER_ADDR=
NPROC_PER_NODE=8
swift sft
--model_type internvl2_5
--model /workspace/swift/internVL2_5/InternVL2_5-8B
--train_type lora
--torch_dtype bfloat16
--freeze_vit true
--dataset
--val_dataset
--deepspeed zero2
--lora_rank 8
--lora_alpha 32
--learning_rate 1e-4
--per_device_train_batch_size 1
--attn_impl flash_attn
--eval_steps 1000
--save_steps 1000
--num_train_epochs 5
--save_total_limit 5
--gradient_accumulation_steps 8
--acc_strategy token
--max_length 2048
--gradient_checkpointing_kwargs '{"use_reentrant": false}'
--save_only_model True

@UnderElm
Copy link
Author

zero3 解决

@UnderElm UnderElm closed this as completed Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant