Vector types for generic (kernel-) code #4719

pauleonix · 2025-05-16T11:46:40Z

pauleonix
May 16, 2025

I just saw #4674 from @davebayer introducing the tuple protocol for vector types which sounds great for generic/templated (tunable) device code.

Any plans for supporting vector types that are not part of CUDA because they have more than 4 elements but still would fit within vectorized loads, like e.g. char16? Using the tuple protocol these should fit in nicely.

Also, for generic code using the tuple protocol an unrolled loop template like https://stackoverflow.com/a/46873787/10107454 makes sense (as it gives actual constexpr loop indices unlike #pragma unroll), so maybe that could also be part of libcu++'s extended API?

miscco · 2025-05-16T12:00:17Z

miscco
May 16, 2025
Collaborator

I am open to discussion extending the tuple protocol to more types.

I am a bit wary about the pragma unroll suggestion, because it is a nontrivial change to the code and might also affect e.g. binary size

1 reply

pauleonix May 16, 2025
Author

Nontrivial change to which code? While I guess one can also discuss this for CUB algorithms that have unrolled loops, for now I was just proposing it as a utility for users.

How does this kind of unrolling affect binary size differently from using #pragma unroll or unrolling manually (which is what one currently needs to do with CUDA vector types, assuming you don't somehow memcpyor reinterpret_cast to some other type). Are you wary because the compiler can ignore #pragma unroll if it deems it too extreme? Do you fear that users will use it in situation where it is not needed to avoid local memory?

fbusato · 2025-05-16T16:12:12Z

fbusato
May 16, 2025
Collaborator

Any plans for supporting vector types that are not part of CUDA because they have more than 4 elements but still would fit within vectorized loads, like e.g. char16? Using the tuple protocol these should fit in nicely.

This makes a lot of sense. Considering also that the maximum access size exposed in CUDA 12.9 - PTX is 32 bytes, so even char32 works fine.

Also, for generic code using the tuple protocol an unrolled loop template like stackoverflow.com/a/46873787/10107454 makes sense (as it gives actual constexpr loop indices unlike #pragma unroll), so maybe that could also be part of libcu++'s extended API?

There are situations where the compiler doesn't actually unroll a loop, even with #pragma unroll. This is the reason CUTLASS and other libraries have their own version of compile-time loop. I recently proposed a similar functionality to @miscco recently.
This is also aligned to a similar functionality in C++26, see https://pydong.org/posts/variadic-switch/

4 replies

pauleonix May 16, 2025
Author

I'm not up-to-date with C++26, but template for (constexpr auto Idx : std::views::iota{0uz, size}) looks great, yes! Thanks for the link!

miscco May 16, 2025
Collaborator

Isnt that more in the direction of std::simd

pauleonix May 16, 2025
Author

@miscco Given that there isn't much in terms of vector-compute instructions in CUDA/PTX and that SIMD loads on the CPU do not have to be aligned (I heard that nowadays the performance-hit of unaligned SIMD loads isn't a big issue anymore), I'm not sure how well it maps. https://eel.is/c++draft/simd.subscr is also not restricted to constexpr indices.

pauleonix May 16, 2025
Author

Either way I think a potential implementation of cuda::std::simd would make heavy use of this kind of unrolled loop if it is based on cuda::vector_type.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Vector types for generic (kernel-) code #4719

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Vector types for generic (kernel-) code #4719

Uh oh!

pauleonix May 16, 2025

Replies: 2 comments · 5 replies

Uh oh!

miscco May 16, 2025 Collaborator

Uh oh!

Uh oh!

pauleonix May 16, 2025 Author

Uh oh!

fbusato May 16, 2025 Collaborator

Uh oh!

Uh oh!

pauleonix May 16, 2025 Author

Uh oh!

miscco May 16, 2025 Collaborator

Uh oh!

pauleonix May 16, 2025 Author

Uh oh!

pauleonix May 16, 2025 Author

pauleonix
May 16, 2025

Replies: 2 comments 5 replies

miscco
May 16, 2025
Collaborator

pauleonix May 16, 2025
Author

fbusato
May 16, 2025
Collaborator

pauleonix May 16, 2025
Author

miscco May 16, 2025
Collaborator

pauleonix May 16, 2025
Author

pauleonix May 16, 2025
Author