Dao-AILab / flash-attention Public

Notifications You must be signed in to change notification settings
Fork 1.4k
Star 14.9k

Code
Issues 627
Pull requests 53
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: Dao-AILab/flash-attention

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

627 Open 557 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

ERROR: No matching distribution found for flash-attn==2.6.3+cu123torch2.4cxx11abifalse

#1423 opened Jan 6, 2025 by carolynsoo

Unable to install flash_attn on H100 with CUDA 12.5

#1422 opened Jan 6, 2025 by ghadiaravi13

Unable to install flash-attn even if I first install torch alone

#1421 opened Jan 3, 2025 by ytxmobile98

Is there a plan to support flash_attn_varlen_backward with fp8

#1420 opened Jan 3, 2025 by gaodaheng

Encounter some problems when building wheel

#1418 opened Jan 2, 2025 by ZarkPanda

flash_attn_with_kvcache discrepancy slicing kv_cache / cache_seqlens

#1417 opened Jan 1, 2025 by jeromeku

RuntimeError: Error compiling objects for extension

#1415 opened Dec 27, 2024 by ProgramerSalar

looking for a test to compare the result with the KV cache updated in place and without the KV cache

#1414 opened Dec 26, 2024 by chakpongchung

Performance Impact of Using Three Warps per Group (WG) in FA3 Compared to Two WGs

#1413 opened Dec 24, 2024 by ziyuhuang123

UnboundLocalError: local variable 'out' referenced before assignment

#1412 opened Dec 24, 2024 by chuangzhidan

Can't intall it

#1411 opened Dec 24, 2024 by TherrenceF

Impact of Register Spills on FA3 Kernel Performance

#1410 opened Dec 24, 2024 by ziyuhuang123

FA 2.4.2 is falling unitest on A6000 and A5880

#1409 opened Dec 23, 2024 by BoxiangW

Why Does FA3 Use Registers Instead of Directly Accessing SMEM with WGMMA on SM90?

#1407 opened Dec 23, 2024 by ziyuhuang123

4 Failing test_flash_attn_output_fp8 tests on H100

#1404 opened Dec 20, 2024 by BioGeek

Does bar.sync Emit Semaphores Alongside bar.arrive?

#1403 opened Dec 20, 2024 by ziyuhuang123

is flash_attn_with_kvcache() supposed to work for seqlen > 1 ?

#1402 opened Dec 20, 2024 by vince62s

Understanding sync and arrive in FA3 Store Function

#1401 opened Dec 19, 2024 by ziyuhuang123

Understanding the Role of arrive in NamedBarrier Synchronization

#1400 opened Dec 19, 2024 by ziyuhuang123

The execution order between GEMM0 of the next iteration and GEMM1 of the current iteration in Pingpong scheduling pipeline for overlapping gemms and softmax between warpgroups

#1398 opened Dec 19, 2024 by tengdecheng

Why Doesn't FlashAttention3 Allow KV and O to Share Memory Space?

#1396 opened Dec 18, 2024 by ziyuhuang123

g2s K tensor when handling padding in the seq_k, clear it rather than keeping the default SMEM values.

#1395 opened Dec 18, 2024 by NVIDIA-JerryChen

Large loss of accuracy between flashattention and native

#1391 opened Dec 17, 2024 by fanfanaaaa

a small typo and fix

#1390 opened Dec 16, 2024 by liguohao96

Why does NamedBarrier in epilogue use NumMmaThreads(256) + NumThreadsPerWarp(32)?

#1389 opened Dec 16, 2024 by ziyuhuang123

Previous 1 2 3 4 5 … 25 26 Next

Previous Next

ProTip! Type g i on any issue or pull request to go back to the issue listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly