Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA out of memory on multiple GPUs #40

Open
zxccade opened this issue Feb 24, 2025 · 2 comments
Open

CUDA out of memory on multiple GPUs #40

zxccade opened this issue Feb 24, 2025 · 2 comments

Comments

@zxccade
Copy link

zxccade commented Feb 24, 2025

Hi author,

Thanks for your great work on Sa2VA!

I got an issue when trying to infer on multiple GPUs using the script you provided at Sa2VA-26B

I found only the first GPU was used for inference when I got CUDA out of memory problem and other GPUs still had spare memory.

Could you help me address this issue?

Best,

@HarborYuan
Copy link
Collaborator

Hi @zxccade ,

Could you please provide the script you are using.

@zxccade
Copy link
Author

zxccade commented Feb 25, 2025

Hi there,

Here is the script I used. I tried to infer the model from long videos with multiple GPUs.

def get_rank_and_world_size():
    rank = int(os.environ.get('RANK', 0))
    world_size = int(os.environ.get('WORLD_SIZE', 1))
    return rank, world_size

def split_model(model_name):
    import math
    device_map = {}
    num_gpus = torch.cuda.device_count()
    rank, world_size = get_rank_and_world_size()
    num_gpus = num_gpus // world_size

    num_layers = {'Sa2VA-8B': 32, 'Sa2VA-26B': 48,
                  'Sa2VA-38B': 64, 'Sa2VA-78B': 80}[model_name]
    # Since the first GPU will be used for ViT, treat it as 0.8 GPU.
    num_layers_per_gpu = math.ceil(num_layers / (num_gpus - 0.2))
    num_layers_per_gpu = [num_layers_per_gpu] * num_gpus
    num_layers_per_gpu[0] = math.ceil(num_layers_per_gpu[0] * 0.8)
    layer_cnt = 0
    for i, num_layer in enumerate(num_layers_per_gpu):
        for j in range(num_layer):
            device_map[f'language_model.model.layers.{layer_cnt}'] = rank + world_size * i
            layer_cnt += 1
    device_map['vision_model'] = rank
    device_map['mlp1'] = rank
    device_map['language_model.model.tok_embeddings'] = rank
    device_map['language_model.model.embed_tokens'] = rank
    device_map['language_model.output'] = rank
    device_map['language_model.model.norm'] = rank
    device_map['language_model.lm_head'] = rank
    device_map[f'language_model.model.layers.{num_layers - 1}'] = rank
    device_map['grounding_encoder'] = rank
    device_map['text_hidden_fcs'] = rank
    return device_map

path = "ByteDance/Sa2VA-8B"
device_map = split_model("Sa2VA-8B")
model = AutoModel.from_pretrained(
    path,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    use_flash_attn=True,
    trust_remote_code=True,
    device_map=device_map,
).eval()
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True, use_fast=False)

vid_frames = []
    for img_path in images_paths:
        img = Image.open(img_path).convert('RGB')
        vid_frames.append(img)
text_prompts = f"<image>Answer the question about the video: {data['question']} \n"
input_dict = {
        'video': vid_frames,
        'text': text_prompts,
        'past_text': '',
        'mask_prompts': None,
        'tokenizer': tokenizer,
    }
return_dict = model.predict_forward(**input_dict)
answer = return_dict["prediction"] # the text format answer
answer = answer_vqa.split("<|im_end|>")[0]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants