CUDA out of memory on multiple GPUs #40

zxccade · 2025-02-24T15:41:16Z

Hi author,

Thanks for your great work on Sa2VA!

I got an issue when trying to infer on multiple GPUs using the script you provided at Sa2VA-26B

I found only the first GPU was used for inference when I got CUDA out of memory problem and other GPUs still had spare memory.

Could you help me address this issue?

Best,

HarborYuan · 2025-02-25T01:09:05Z

Hi @zxccade ,

Could you please provide the script you are using.

zxccade · 2025-02-25T15:07:24Z

Hi there,

Here is the script I used. I tried to infer the model from long videos with multiple GPUs.

def get_rank_and_world_size():
    rank = int(os.environ.get('RANK', 0))
    world_size = int(os.environ.get('WORLD_SIZE', 1))
    return rank, world_size

def split_model(model_name):
    import math
    device_map = {}
    num_gpus = torch.cuda.device_count()
    rank, world_size = get_rank_and_world_size()
    num_gpus = num_gpus // world_size

    num_layers = {'Sa2VA-8B': 32, 'Sa2VA-26B': 48,
                  'Sa2VA-38B': 64, 'Sa2VA-78B': 80}[model_name]
    # Since the first GPU will be used for ViT, treat it as 0.8 GPU.
    num_layers_per_gpu = math.ceil(num_layers / (num_gpus - 0.2))
    num_layers_per_gpu = [num_layers_per_gpu] * num_gpus
    num_layers_per_gpu[0] = math.ceil(num_layers_per_gpu[0] * 0.8)
    layer_cnt = 0
    for i, num_layer in enumerate(num_layers_per_gpu):
        for j in range(num_layer):
            device_map[f'language_model.model.layers.{layer_cnt}'] = rank + world_size * i
            layer_cnt += 1
    device_map['vision_model'] = rank
    device_map['mlp1'] = rank
    device_map['language_model.model.tok_embeddings'] = rank
    device_map['language_model.model.embed_tokens'] = rank
    device_map['language_model.output'] = rank
    device_map['language_model.model.norm'] = rank
    device_map['language_model.lm_head'] = rank
    device_map[f'language_model.model.layers.{num_layers - 1}'] = rank
    device_map['grounding_encoder'] = rank
    device_map['text_hidden_fcs'] = rank
    return device_map

path = "ByteDance/Sa2VA-8B"
device_map = split_model("Sa2VA-8B")
model = AutoModel.from_pretrained(
    path,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    use_flash_attn=True,
    trust_remote_code=True,
    device_map=device_map,
).eval()
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True, use_fast=False)

vid_frames = []
    for img_path in images_paths:
        img = Image.open(img_path).convert('RGB')
        vid_frames.append(img)
text_prompts = f"<image>Answer the question about the video: {data['question']} \n"
input_dict = {
        'video': vid_frames,
        'text': text_prompts,
        'past_text': '',
        'mask_prompts': None,
        'tokenizer': tokenizer,
    }
return_dict = model.predict_forward(**input_dict)
answer = return_dict["prediction"] # the text format answer
answer = answer_vqa.split("<|im_end|>")[0]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA out of memory on multiple GPUs #40

CUDA out of memory on multiple GPUs #40

zxccade commented Feb 24, 2025

HarborYuan commented Feb 25, 2025

zxccade commented Feb 25, 2025 •

edited

Loading

CUDA out of memory on multiple GPUs #40

CUDA out of memory on multiple GPUs #40

Comments

zxccade commented Feb 24, 2025

HarborYuan commented Feb 25, 2025

zxccade commented Feb 25, 2025 • edited Loading

zxccade commented Feb 25, 2025 •

edited

Loading