Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with ppo_trainer.generate() #17

Open
aishu194 opened this issue Aug 20, 2023 · 2 comments
Open

Issue with ppo_trainer.generate() #17

aishu194 opened this issue Aug 20, 2023 · 2 comments

Comments

@aishu194
Copy link

Thank you for the clear-cut amazing video tutorial and repo. I have been working on this repo and faced the following issue on 8 GPU A100 with OS disk space of 100GB and 5TB external. Could you kindly help me with this!!

Traceback (most recent call last):
File "rl_finetuning.py", line 175, in
response_tensor = ppo_trainer.generate(query_tensor, pad_token_id=tokenizer.eos_token_id, max_new_tokens=20)
File "/data-mount/trl/trl/trainer/ppo_trainer.py", line 450, in generate
response = self.accelerator.unwrap_model(self.model).generate(
File "/data-mount/trl/trl/models/modeling_value_head.py", line 198, in generate
return self.pretrained_model.generate(*args, **kwargs)
File "/home/aishu/.local/lib/python3.8/site-packages/peft/peft_model.py", line 977, in generate
outputs = self.base_model.generate(**kwargs)
File "/home/aishu/.local/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/aishu/.local/lib/python3.8/site-packages/transformers/generation/utils.py", line 1642, in generate
return self.sample(
File "/home/aishu/.local/lib/python3.8/site-packages/transformers/generation/utils.py", line 2724, in sample
outputs = self(
File "/home/aishu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/aishu/.local/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 809, in forward
outputs = self.model(
File "/home/aishu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/aishu/.local/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 628, in forward
batch_size, seq_length = input_ids.shape
ValueError: too many values to unpack (expected 2)

@aishu194 aishu194 changed the title Issue with ppo_trainer.generate() Issue with model.generate() Aug 22, 2023
@aishu194 aishu194 changed the title Issue with model.generate() Issue with ppo_trainer.generate() Aug 22, 2023
@chunhualiao
Copy link

chunhualiao commented Aug 26, 2023

I encountered the same error. Using python -m pdb , I investigated the tensor's shape at runtime.

It had the right 2-D shape initially:

(Pdb) up
> ..../pytorch1.13.1/lib/python3.9/site-packages/trl/trainer/ppo_trainer.py(454)generate()
-> response = self.accelerator.unwrap_model(self.model).generate(
(Pdb) p query_tensor
tensor([[    1, 12027,  7420,   278,  2224,  3021,   952, 29875,  3002,   310,
           379,  5667, 29914, 29909,  1367, 29903, 29889,  4121,   993, 13676,
         17091,  5065,  3381,   322,   521,   342,  6788,   363,   278,  4940,
          4723, 29889]], device='cuda:0')
(Pdb) p query_tensor.shape
torch.Size([1, 32])

But the code at line 455 added a new dimension by using the following statement
input_ids=query_tensor.unsqueeze(dim=0)

As a result, when the code reaches

pytorch1.13.1/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 623
batch_size, seq_length = input_ids.shape

(Pdb) p input_ids.shape

torch.Size([1, 1, 32])

The assignment tries to assign a 3-D shape into two variables, triggering the error:

ValueError: too many values to unpack (expected 2)

This seems to be a bug in the package.

Somebody reported a similar problem and a solution: https://stackoverflow.com/questions/67193312/huggingface-transformers-returning-valueerror-too-many-values-to-unpack-expec

essentially, the code needs to ignore the first dimension by using something like

    fakevar1, batch_size, seq_length = input_ids.shape

@chunhualiao
Copy link

I suggest adding "ValueError: too many values to unpack (expected 2)" into the issue title so others can easily find this error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants