Use 128-bit data loading #326

kitecats · 2025-05-31T02:06:55Z

For 128-bit vectorized load operations, we should use AutoVectorizingCopyWithAssumedAlignment<sizeof(uint128_t)*8> instead of AutoVectorizingCopyWithAssumedAlignment<sizeof(uint128_t)>. This is because the AutoVectorizingCopyWithAssumedAlignment template parameter expects the alignment value in bits, whereas sizeof(uint128_t) returns 16 bytes. Passing the byte value directly would mistakenly make the template use 16-bit alignment, resulting in actual operations using uint16_t for data loading rather than the intended 128-bit vectorized loading.

DefTruth · 2025-06-02T09:21:52Z

@botbw Can you take a look to this PR?

botbw · 2025-06-02T09:29:41Z

AutoVectorizingCopyWithAssumedAlignment

@kitecats @DefTruth Yuh it should be a typo here, sizeof(uint128_t) returns bytes, thanks for pointing out.

https://github.com/NVIDIA/cutlass/blob/9d165a3b8ef446a7ff3db198413f82bcb83f46fe/include/cute/arch/copy.hpp#L68-L81

botbw

Thanks for fixing!

DefTruth

LGTM~

Use 128-bit data loading

187af86

DefTruth self-requested a review June 2, 2025 09:21

botbw approved these changes Jun 2, 2025

View reviewed changes

DefTruth approved these changes Jun 2, 2025

View reviewed changes

DefTruth merged commit c10770b into xlite-dev:main Jun 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Use 128-bit data loading #326

Use 128-bit data loading #326

Uh oh!

kitecats commented May 31, 2025

Uh oh!

DefTruth commented Jun 2, 2025

Uh oh!

botbw commented Jun 2, 2025

Uh oh!

botbw left a comment

Uh oh!

DefTruth left a comment

Uh oh!

Uh oh!

Uh oh!

Use 128-bit data loading #326

Use 128-bit data loading #326

Uh oh!

Conversation

kitecats commented May 31, 2025

Uh oh!

DefTruth commented Jun 2, 2025

Uh oh!

botbw commented Jun 2, 2025

Uh oh!

botbw left a comment

Choose a reason for hiding this comment

Uh oh!

DefTruth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!