CUDA: Fix new mma detection for Turing cards with Volta PTX #12187

neilmehta24 · 2025-03-04T20:21:25Z

We are seeing that this change incorrectly disabled flash attention for Turing cards (cc=75) when llama.cpp was compiled for Volta cards only (cc=70). To fix, check that we have compiled for Volta or greater, and that the card is Turing or greater. If there is a better way to fix, please do advise.

To reproduce the breakage on the current build, compile with architecture 70 and without architecture 75, and generate with flash attention on a Turing card.

JohannesGaessler · 2025-03-06T12:21:41Z

Please confirm whether or not #12222 fixes the issue. The fix in this PR is definitely not correct for all scenarios.

CUDA: Fix new mma detection for Turing cards with Volta PTX

97beceb

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Mar 4, 2025

ggerganov requested a review from JohannesGaessler March 5, 2025 07:41

JohannesGaessler mentioned this pull request Mar 6, 2025

CUDA: fix FA logic for PTX 7.0 and CC >= 7.5 #12222

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA: Fix new mma detection for Turing cards with Volta PTX #12187

CUDA: Fix new mma detection for Turing cards with Volta PTX #12187

neilmehta24 commented Mar 4, 2025 •

edited

Loading

JohannesGaessler commented Mar 6, 2025

CUDA: Fix new mma detection for Turing cards with Volta PTX #12187

Are you sure you want to change the base?

CUDA: Fix new mma detection for Turing cards with Volta PTX #12187

Conversation

neilmehta24 commented Mar 4, 2025 • edited Loading

JohannesGaessler commented Mar 6, 2025

neilmehta24 commented Mar 4, 2025 •

edited

Loading