Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix compilation error with multiple accelerators #282

Merged

Conversation

chillenzer
Copy link
Contributor

The inline assembly instructions for CUDA are __device__-only. ALPAKA_FN_ACC resolves to __host__ __device__ unless we are in CUDA-only mode, so this leads to compilation failures. Fixing this by applying __device__ by hand in those cases.

@chillenzer chillenzer added the bug label Feb 5, 2025
@chillenzer chillenzer added this to the 3.0.0 milestone Feb 5, 2025
@@ -164,7 +164,7 @@ namespace mallocMC

#ifdef ALPAKA_ACC_GPU_HIP_ENABLED
template<typename TDim, typename TIdx>
ALPAKA_FN_ACC inline auto lanemask_lt(alpaka::AccGpuHipRt<TDim, TIdx> const& /*acc*/)
ALPAKA_FN_ACC inline uint32_t lanemask_lt(alpaka::AccGpuHipRt<TDim, TIdx> const& /*acc*/)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ALPAKA_FN_ACC should maybe changed to __device__ too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BUG lanemask_lt() return type must be auto because for AMD HIP the return type is 64bit and for CUDA it is a 32bit type.

@chillenzer chillenzer force-pushed the fix-cuda-only-mode-necessity branch from a83cd69 to 9705fa1 Compare February 6, 2025 10:17
@psychocoderHPC psychocoderHPC merged commit 1feba07 into alpaka-group:dev Feb 6, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants