PR #389 breaks Flash Attention 2 with peft #790

rationalism · 2023-08-05T03:21:10Z

System Info

peft = 0.4.0
accelerate = 0.21.0
transformers = 4.31.0

Ubuntu 22.04, PyTorch 2.0.1, CUDA 11.8, nVidia A6000, Python 3.10

Who can help?

@pacman100

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder
My own task or dataset (give details below)

Reproduction

Sorry to keep harping on this (issue #422 , issue #423), but the type casting in PR #389 now breaks using Flash Attention for Llama with PEFT / QLoRA, as Flash Attention only works with fp16/bf16. Here is the relevant code:

https://huggingface.co/togethercomputer/LLaMA-2-7B-32K/blob/main/modeling_flash_llama.py

Without Flash Attention, using the full context window (now at 4,096 tokens for Llama 2) will not run due to a lack of GPU memory.

Expected behavior

The model should run (Llama 2), but instead leads to a type error:

RuntimeError: FlashAttention only support fp16 and bf16 data type

See also this issue: artidoro/qlora#221

Monkey-patching out the upcasting in other.py fixes the issue.

The text was updated successfully, but these errors were encountered:

philschmid · 2023-08-25T17:31:55Z

You can check out my blog: https://www.philschmid.de/instruction-tune-llama-2 it includes flash attention and works with peft.

davidsvaughn · 2023-08-29T05:39:40Z

@philschmid

Thanks for your blog - it's helped me immensely! I followed the steps you gave at https://www.philschmid.de/instruction-tune-llama-2 and I was able to get flash attention working (on H100) but only after adding a couple extra lines. Initially I was getting the same error: RuntimeError: FlashAttention only support fp16 and bf16 data type

but I was finally able to solve it by following this line (from your instructions):

model = get_peft_model(model, peft_config)

with these two extra lines (my addition):

from utils.llama_patch import upcast_layer_for_flash_attention
model = upcast_layer_for_flash_attention(model, torch.bfloat16)

Just FYI. I'm not sure why this worked (or why you didn't need this but I did), but I just thought it might be of interest to you, and possibly helpful to others. (BTW, for the sake of others besides @philschmid, utils.llama_patch comes from here: https://github.com/philschmid/deep-learning-pytorch-huggingface/blob/main/training/utils/llama_patch.py)

juhoward · 2023-08-30T15:34:18Z

@davidsvaughn
Thanks for this comment. I apparently had the same issue, solve by these two lines of code.

allanton · 2023-09-08T22:36:26Z

I was following @philschmid blog (extremely useful btw, thank you!) and got the same error. @davidsvaughn comment also worked for me, so ty

github-actions · 2023-10-13T15:04:28Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

sids07 · 2023-10-16T12:55:29Z

@davidsvaughn and @fullanton from what package is utils import from? I am getting utils modules not found error please help?

github-actions · 2023-11-09T15:03:41Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

allanton · 2023-11-09T18:55:01Z

@davidsvaughn and @fullanton from what package is utils import from? I am getting utils modules not found error please help?

Hey @sids07 this is from here: https://github.com/philschmid/deep-learning-pytorch-huggingface/tree/main/training/

RonanKMcGovern · 2023-11-15T10:24:43Z

@younesbelkada should this latest issue be fixed? I had a similar issue running https://huggingface.co/larryvrh/Yi-6B-200K-Llamafied today.

github-actions · 2023-12-09T15:03:59Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

younesbelkada · 2023-12-18T11:27:14Z

This should be fixed I think with latest PEFT ! Closing it for now feel free to open a new issue if that's not the case

weicheng113 mentioned this issue Oct 27, 2023

Fixed - RuntimeError: FlashAttention only supports fp16 and bf16 data type dvlab-research/LongLoRA#101

Closed

younesbelkada closed this as completed Dec 18, 2023

chansonzhang mentioned this issue Dec 26, 2024

[Bug]: vLLM got different results with PeftModelForCausalLM QwenLM/Qwen2.5#1018

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PR #389 breaks Flash Attention 2 with peft #790

PR #389 breaks Flash Attention 2 with peft #790

rationalism commented Aug 5, 2023

philschmid commented Aug 25, 2023

davidsvaughn commented Aug 29, 2023

juhoward commented Aug 30, 2023

allanton commented Sep 8, 2023

github-actions bot commented Oct 13, 2023

sids07 commented Oct 16, 2023

github-actions bot commented Nov 9, 2023

allanton commented Nov 9, 2023

RonanKMcGovern commented Nov 15, 2023

github-actions bot commented Dec 9, 2023

younesbelkada commented Dec 18, 2023

PR #389 breaks Flash Attention 2 with peft #790

PR #389 breaks Flash Attention 2 with peft #790

Comments

rationalism commented Aug 5, 2023

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

philschmid commented Aug 25, 2023

davidsvaughn commented Aug 29, 2023

juhoward commented Aug 30, 2023

allanton commented Sep 8, 2023

github-actions bot commented Oct 13, 2023

sids07 commented Oct 16, 2023

github-actions bot commented Nov 9, 2023

allanton commented Nov 9, 2023

RonanKMcGovern commented Nov 15, 2023

github-actions bot commented Dec 9, 2023

younesbelkada commented Dec 18, 2023