Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR #389 breaks Flash Attention 2 with peft #790

Closed
2 of 4 tasks
rationalism opened this issue Aug 5, 2023 · 11 comments
Closed
2 of 4 tasks

PR #389 breaks Flash Attention 2 with peft #790

rationalism opened this issue Aug 5, 2023 · 11 comments

Comments

@rationalism
Copy link

System Info

peft = 0.4.0
accelerate = 0.21.0
transformers = 4.31.0

Ubuntu 22.04, PyTorch 2.0.1, CUDA 11.8, nVidia A6000, Python 3.10

Who can help?

@pacman100

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder
  • My own task or dataset (give details below)

Reproduction

Sorry to keep harping on this (issue #422 , issue #423), but the type casting in PR #389 now breaks using Flash Attention for Llama with PEFT / QLoRA, as Flash Attention only works with fp16/bf16. Here is the relevant code:

https://huggingface.co/togethercomputer/LLaMA-2-7B-32K/blob/main/modeling_flash_llama.py

Without Flash Attention, using the full context window (now at 4,096 tokens for Llama 2) will not run due to a lack of GPU memory.

Expected behavior

The model should run (Llama 2), but instead leads to a type error:

RuntimeError: FlashAttention only support fp16 and bf16 data type

See also this issue: artidoro/qlora#221

Monkey-patching out the upcasting in other.py fixes the issue.

@philschmid
Copy link

You can check out my blog: https://www.philschmid.de/instruction-tune-llama-2 it includes flash attention and works with peft.

@davidsvaughn
Copy link

@philschmid

Thanks for your blog - it's helped me immensely! I followed the steps you gave at https://www.philschmid.de/instruction-tune-llama-2 and I was able to get flash attention working (on H100) but only after adding a couple extra lines. Initially I was getting the same error: RuntimeError: FlashAttention only support fp16 and bf16 data type

but I was finally able to solve it by following this line (from your instructions):

model = get_peft_model(model, peft_config)

with these two extra lines (my addition):

from utils.llama_patch import upcast_layer_for_flash_attention
model = upcast_layer_for_flash_attention(model, torch.bfloat16)

Just FYI. I'm not sure why this worked (or why you didn't need this but I did), but I just thought it might be of interest to you, and possibly helpful to others. (BTW, for the sake of others besides @philschmid, utils.llama_patch comes from here: https://github.com/philschmid/deep-learning-pytorch-huggingface/blob/main/training/utils/llama_patch.py)

@juhoward
Copy link

@davidsvaughn
Thanks for this comment. I apparently had the same issue, solve by these two lines of code.

@allanton
Copy link

allanton commented Sep 8, 2023

I was following @philschmid blog (extremely useful btw, thank you!) and got the same error. @davidsvaughn comment also worked for me, so ty

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@sids07
Copy link

sids07 commented Oct 16, 2023

@davidsvaughn and @fullanton from what package is utils import from? I am getting utils modules not found error please help?

Copy link

github-actions bot commented Nov 9, 2023

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@allanton
Copy link

allanton commented Nov 9, 2023

@davidsvaughn and @fullanton from what package is utils import from? I am getting utils modules not found error please help?

Hey @sids07 this is from here: https://github.com/philschmid/deep-learning-pytorch-huggingface/tree/main/training/

@RonanKMcGovern
Copy link

@younesbelkada should this latest issue be fixed? I had a similar issue running https://huggingface.co/larryvrh/Yi-6B-200K-Llamafied today.

Copy link

github-actions bot commented Dec 9, 2023

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@younesbelkada
Copy link
Contributor

This should be fixed I think with latest PEFT ! Closing it for now feel free to open a new issue if that's not the case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants