Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qwen2-vl-7b爆内存,注意不是显存,是爆内存!内存没回收= = #2757

Open
hl0737 opened this issue Dec 24, 2024 · 3 comments
Open

Comments

@hl0737
Copy link

hl0737 commented Dec 24, 2024

Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)

RT

image

这是资源图,512GB内存,显存一直没问题,稳定在50GB左右,内存会一直增长= =哪里有内存泄漏~~

Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)

cuda12.6
ubuntu 22.04
swift 3.0.0
gpu a800 80gb
torch 2.5.1

Additional context
Add any other context about the problem here(在这里补充其他信息)

@hl0737
Copy link
Author

hl0737 commented Dec 24, 2024

补充下启动命令是

`source /maindata/data/shared/public/liang.hu/conda/bin/activate
conda activate swift

NNODES=$WORLD_SIZE
NODE_RANK=$RANK
MASTER_ADDR=$MASTER_ADDR
NPROC_PER_NODE=8
MAX_PIXELS=602112
VIDEO_MAX_PIXELS=602112
NFRAMES=8
swift sft
--model /maindata/data/shared/public/chunli.peng/ckpt/Qwen2-VL-7B-Instruct/
--train_type full
--torch_dtype bfloat16
--per_device_train_batch_size 1
--gradient_accumulation_steps 8
--dataset /maindata/data/shared/public/liang.hu/infer1/train_qwen2vl_sft_swift.jsonl
--output_dir /maindata/data/shared/public/liang.hu/infer1/qwen2_sft/test
--num_train_epochs 1
--save_strategy 'no'
--eval_strategy 'no'
--logging_steps 1
--warmup_ratio 0.05
--report_to wandb
--gradient_checkpointing true
--freeze_vit true
--deepspeed zero2
--attn_impl flash_attn`

数据的格式是

image

@TimeLessLing
Copy link

我好像也遇到这个问题了,请问老哥有办法解决吗?

@hl0737
Copy link
Author

hl0737 commented Jan 3, 2025

我好像也遇到这个问题了,请问老哥有办法解决吗?

有,办法很粗暴,就是多开点内存就行= =只要在OOM之前程序没崩,就不算OOM,你懂的,哈哈哈哈哈

貌似阿里云单节点现在内存能做到2T,你瞅瞅多开点

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants