Skip to content

Implement avx512 compressstore intrinsics #1273

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 24, 2022

Conversation

jhorstmann
Copy link
Contributor

Implement avx512f compressstore intrinsics.

@@ -38007,6 +38127,34 @@ extern "C" {
#[link_name = "llvm.x86.avx512.mask.compress.pd.128"]
fn vcompresspd128(a: f64x2, src: f64x2, mask: u8) -> f64x2;

#[link_name = "llvm.x86.avx512.mask.compress.store.d.512"]
fn vcompressd_mem(mem: *mut i8, data: i32x16, mask: u16);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better naming convention for these intrinsics? The asm mnemonic is the same for mem and reg operations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recommended way to figure this out is to look at what IR clang generates: https://godbolt.org/z/nvaxM4MGh

In this case it is calling the llvm.masked.compressstore.v2f64 intrinsic which unfortunately can't be called directly from Rust because it uses a i1 vector which can't be represented with Rust types.

This is the reason why #1254 implemented some of the AVX512 intrinsics using inline assembly instead. I think this is the right approach in this case as well.

Copy link
Contributor Author

@jhorstmann jhorstmann Jan 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These llvm intrinsic seem to work though, and I saw them used with plain integer masks in llvm testcases: https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/X86/avx512-intrinsics-upgrade.ll#L8938 (not sure if there is an official documentation for them though). I wanted to avoid asm unless absolutely necessary.

Test failures in CI seem unrelated to the changes in this PR.

#[inline]
#[target_feature(enable = "avx512f")]
#[cfg_attr(test, assert_instr(vpcompressd))]
pub unsafe fn _mm512_mask_compressstoreu_epi32(base_addr: *mut i8, k: __mmask16, a: __m512i) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel's intrinsic guide uses a void* for base_addr, the llvm intrinsics use an i8*. Using a ptr of the correct datatype would be more ergonomic, but I'm not sure whether that might prevent using the intrinsics for actually unaligned data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention here is to use *mut u8 where C uses void pointers. LLVM's i8 doesn't mean anything since LLVM IR types don't have signs: LLVM's i8 is used for both of Rust's u8 and i8.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed that convention when implementing the masked load/store instructions, there are also several more intrinsics that did not already follow this convention. I can take a look at adjusting the stdarch-verify test to catch this and change any existing type differences. Since avx512 is still unstable that should be possible I guess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants