-
Notifications
You must be signed in to change notification settings - Fork 289
Implement avx512 compressstore intrinsics #1273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement avx512 compressstore intrinsics #1273
Conversation
@@ -38007,6 +38127,34 @@ extern "C" { | |||
#[link_name = "llvm.x86.avx512.mask.compress.pd.128"] | |||
fn vcompresspd128(a: f64x2, src: f64x2, mask: u8) -> f64x2; | |||
|
|||
#[link_name = "llvm.x86.avx512.mask.compress.store.d.512"] | |||
fn vcompressd_mem(mem: *mut i8, data: i32x16, mask: u16); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a better naming convention for these intrinsics? The asm mnemonic is the same for mem and reg operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The recommended way to figure this out is to look at what IR clang generates: https://godbolt.org/z/nvaxM4MGh
In this case it is calling the llvm.masked.compressstore.v2f64
intrinsic which unfortunately can't be called directly from Rust because it uses a i1
vector which can't be represented with Rust types.
This is the reason why #1254 implemented some of the AVX512 intrinsics using inline assembly instead. I think this is the right approach in this case as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These llvm intrinsic seem to work though, and I saw them used with plain integer masks in llvm testcases: https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/X86/avx512-intrinsics-upgrade.ll#L8938 (not sure if there is an official documentation for them though). I wanted to avoid asm unless absolutely necessary.
Test failures in CI seem unrelated to the changes in this PR.
#[inline] | ||
#[target_feature(enable = "avx512f")] | ||
#[cfg_attr(test, assert_instr(vpcompressd))] | ||
pub unsafe fn _mm512_mask_compressstoreu_epi32(base_addr: *mut i8, k: __mmask16, a: __m512i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel's intrinsic guide uses a void*
for base_addr
, the llvm intrinsics use an i8*
. Using a ptr of the correct datatype would be more ergonomic, but I'm not sure whether that might prevent using the intrinsics for actually unaligned data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The convention here is to use *mut u8
where C uses void pointers. LLVM's i8
doesn't mean anything since LLVM IR types don't have signs: LLVM's i8
is used for both of Rust's u8
and i8
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed that convention when implementing the masked load/store instructions, there are also several more intrinsics that did not already follow this convention. I can take a look at adjusting the stdarch-verify
test to catch this and change any existing type differences. Since avx512 is still unstable that should be possible I guess.
Implement avx512f compressstore intrinsics.