Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix unused variable for CUDA EP builds with USE_FLASH_ATTENTION off (m…
…icrosoft#14404) ### Description Fixes unused `use_memory_efficient_attention` variable in contrib_ops/cuda/bert/attention_impl.cu. ### Motivation and Context ORT with CUDA version < 11.6 fails to build for release configurations due to an unused variable. ```shell c:\...\onnxruntime\onnxruntime\contrib_ops\cuda\bert\attention_impl.cu(420): error : variable "use_memory_efficient_attention" was declared but never referenced [C:\...\onnxruntime\build\Windows\RelWithDebInfo\onnx runtime_providers_cuda.vcxproj] detected during instantiation of "onnxruntime::common::Status onnxruntime::contrib::cuda::QkvToContext(const cudaDeviceProp &, cublasHandle_t &, cudaStream_t, onnxruntime::contrib::AttentionParameters &, onnxruntime::contrib::cuda::AttentionData<T> &) [wit h T=float]" (923): here ``` This happens for CUDA < 11.6. Our cmake script turns off onnxruntime_USE_FLASH_ATTENTION for CUDA < 11.6, which leaves the aforementioned variable unused outside of asserts (which are removed in release builds). The USE_FLASH_ATTENTION option was added by microsoft#14343
- Loading branch information