Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama-Vision FT Error: The number of images in each batch [1, 1] should be the same [1, 1] should be the same #37

Open
JunMa11 opened this issue Sep 25, 2024 · 3 comments

Comments

@JunMa11
Copy link

JunMa11 commented Sep 25, 2024

Dear all,

Thank you so much for sharing the llama3.2 vision model fine-tuning script so fast!

I got the following error when running the demo

The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 11.36it/s]
trainable params: 31,416,320 || all params: 10,674,357,795 || trainable%: 0.2943
/home/jma/anaconda3/lib/python3.12/site-packages/torch/cuda/__init__.py:128: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at /opt/conda/conda-bld/pytorch_1720538439675/work/c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
/home/jma/anaconda3/lib/python3.12/site-packages/transformers/optimization.py:591: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
  0%|                                                                                                                         | 0/2144 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/home/jma/Documents/llama3/finetune_demo.py", line 86, in <module>
    trainer.train()
  File "/home/jma/anaconda3/lib/python3.12/site-packages/transformers/trainer.py", line 2043, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/jma/anaconda3/lib/python3.12/site-packages/transformers/trainer.py", line 2345, in _inner_training_loop
    for step, inputs in enumerate(epoch_iterator):
  File "/home/jma/anaconda3/lib/python3.12/site-packages/accelerate/data_loader.py", line 550, in __iter__
    current_batch = next(dataloader_iter)
                    ^^^^^^^^^^^^^^^^^^^^^
  File "/home/jma/anaconda3/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 630, in __next__
    data = self._next_data()
           ^^^^^^^^^^^^^^^^^
  File "/home/jma/anaconda3/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 673, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jma/anaconda3/lib/python3.12/site-packages/torch/utils/data/_utils/fetch.py", line 55, in fetch
    return self.collate_fn(data)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/jma/Documents/llama3/finetune_demo.py", line 47, in process
    batch = processor(text=texts, images=images, return_tensors="pt", padding=True)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jma/anaconda3/lib/python3.12/site-packages/transformers/models/mllama/processing_mllama.py", line 308, in __call__
    raise ValueError(
ValueError: The number of images in each batch [1, 1] should be the same  [1, 1] should be the same. Yes, the model does not                         support having a different number of images per batch.
  0%|          | 0/2144 [00:00<?, ?it/s]     

I just simply copied the code in a python script:

from transformers import MllamaForConditionalGeneration, AutoProcessor, BitsAndBytesConfig, MllamaProcessor
from peft import LoraConfig, get_peft_model
import torch
from transformers import Trainer
from datasets import load_dataset
from transformers import TrainingArguments

ds = load_dataset("merve/vqav2-small", split="validation[:10%]")

ckpt = "meta-llama/Llama-3.2-11B-Vision"
USE_LORA = True
# if full fine-tune, you can opt to freeze image part

if USE_LORA:
    lora_config = LoraConfig(
        r=8,
        lora_alpha=8,
        lora_dropout=0.1,
        target_modules=['down_proj','o_proj','k_proj','q_proj','gate_proj','up_proj','v_proj'],
        use_dora=True, # optional DoRA 
        init_lora_weights="gaussian"
    )

    model = MllamaForConditionalGeneration.from_pretrained(
            ckpt,
            torch_dtype=torch.bfloat16,
            device_map="auto"
    )

    model = get_peft_model(model, lora_config)
    model.print_trainable_parameters()

else:
    model = MllamaForConditionalGeneration.from_pretrained(ckpt,
        torch_dtype=torch.bfloat16, device_map="auto")
    # freeze vision model to save up on compute
    for param in model.vision_model.parameters():
        param.requires_grad = False


processor = AutoProcessor.from_pretrained(ckpt)

def process(examples):
    texts = [f"<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n<|image|>{example['question']} Answer briefly. <|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n{example['multiple_choice_answer']}<|eot_id|>" for example in examples]
    images = [[example["image"].convert("RGB")] for example in examples]

    batch = processor(text=texts, images=images, return_tensors="pt", padding=True)
    labels = batch["input_ids"].clone()
    labels[labels == processor.tokenizer.pad_token_id] = -100 
    labels[labels == 128256] = -100 # image token index
    batch["labels"] = labels
    batch = batch.to(torch.bfloat16).to("cuda")

    return batch



args=TrainingArguments(
            num_train_epochs=2,
            remove_unused_columns=False,
            per_device_train_batch_size=1,
            #gradient_accumulation_steps=4,
            #warmup_steps=2,
            learning_rate=2e-5,
            weight_decay=1e-6,
            adam_beta2=0.999,
            logging_steps=250,
            save_strategy="no",
            optim="adamw_hf",
            push_to_hub=True,
            save_total_limit=1,
            bf16=True,
            output_dir="./lora",
            dataloader_pin_memory=False,
            gradient_checkpointing=True
        )


trainer = Trainer(
        model=model,
        train_dataset=ds,
        data_collator=process,
        args=args
        )

trainer.train()

Any comments on solving this issue are highly appreciated.

@pcuenca
Copy link
Member

pcuenca commented Sep 27, 2024

Hi @JunMa11, thanks for trying it out! I couldn't reproduce your issue, unfortunately.

  • Are you using transformers 4.45.1, which was released last night?
  • Do you see the problem at the beginning of training, or do you see some progress until it happens?

In addition, I would recommend the following configuration for an initial test run: enable lora, disable dora, and disable gradient checkpointing.

@merveenoyan
Copy link
Contributor

@JunMa11 thanks fo reporting! I will reproduce the error now and clarify/modify things in the notebook if necessary.

@merveenoyan
Copy link
Contributor

hello @JunMa11 I couldn't reproduce your issue on transformers 4.45.1 😔 need more details

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants