-
Notifications
You must be signed in to change notification settings - Fork 72
Open
Labels
EnhancementNew feature or requestNew feature or request
Description
Currently u32x8
shuffle1_dyn
are not optimized and fallback is used which results in a whole mess of extract intrinsics. It is not very fast.
Can we please add support for _mm256_permutevar8x32_epi32
and similar variants at the u32x8 (and f32x8, etc.) levels? It is a fairly large speedup.
Thanks
Metadata
Metadata
Assignees
Labels
EnhancementNew feature or requestNew feature or request