[CIR][X86] Implement lowering for AVX512 ktest builtins #169985

GeneraluseAI · 2025-11-29T10:23:36Z

part of #167765
This is a follow-up patch to continue the work from <#169774>.
It builds on the previous lowering infrastructure and extends support to the AVX512 ktest builtins.

github-actions · 2025-11-29T10:23:53Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2025-11-29T10:24:22Z

@llvm/pr-subscribers-clang

Author: AIT (GeneraluseAI)

Changes

part of #167765
This is a follow-up patch to continue the work from <#169774>.
It builds on the previous lowering infrastructure and extends support to the AVX512 ktest builtins.

Patch is 48.58 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/169985.diff

4 Files Affected:

(modified) clang/lib/CIR/CodeGen/CIRGenBuiltinX86.cpp (+93-4)
(modified) clang/test/CIR/CodeGenBuiltins/X86/avx512bw-builtins.c (+474)
(added) clang/test/CIR/CodeGenBuiltins/X86/avx512dq-builtins.c (+275)
(modified) clang/test/CIR/CodeGenBuiltins/X86/avx512f-builtins.c (+207)

diff --git a/clang/lib/CIR/CodeGen/CIRGenBuiltinX86.cpp b/clang/lib/CIR/CodeGen/CIRGenBuiltinX86.cpp
index 0e43345bad6f1..1b6dd54d32646 100644
--- a/clang/lib/CIR/CodeGen/CIRGenBuiltinX86.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenBuiltinX86.cpp
@@ -85,6 +85,36 @@ static mlir::Value getMaskVecValue(CIRGenBuilderTy &builder, mlir::Location loc,
   return maskVec;
 }
 
+static mlir::Value emitX86MaskAddLogic(CIRGenBuilderTy &builder,
+                                       mlir::Location loc,
+                                       const std::string &intrinsicName,
+                                       SmallVectorImpl<mlir::Value> &ops) {
+
+  auto intTy = cast<cir::IntType>(ops[0].getType());
+  unsigned numElts = intTy.getWidth();
+  mlir::Value lhsVec = getMaskVecValue(builder, loc, ops[0], numElts);
+  mlir::Value rhsVec = getMaskVecValue(builder, loc, ops[1], numElts);
+  mlir::Type vecTy = lhsVec.getType();
+  mlir::Value resVec = emitIntrinsicCallOp(builder, loc, intrinsicName, vecTy,
+                                           mlir::ValueRange{lhsVec, rhsVec});
+  return builder.createBitcast(resVec, ops[0].getType());
+}
+
+static mlir::Value emitX86MaskLogic(CIRGenBuilderTy &builder,
+                                    mlir::Location loc,
+                                    cir::BinOpKind binOpKind,
+                                    SmallVectorImpl<mlir::Value> &ops,
+                                    bool invertLHS = false) {
+  unsigned numElts = cast<cir::IntType>(ops[0].getType()).getWidth();
+  mlir::Value lhs = getMaskVecValue(builder, loc, ops[0], numElts);
+  mlir::Value rhs = getMaskVecValue(builder, loc, ops[1], numElts);
+
+  if (invertLHS)
+    lhs = builder.createNot(lhs);
+  return builder.createBitcast(builder.createBinop(loc, lhs, binOpKind, rhs),
+                               ops[0].getType());
+}
+
 mlir::Value CIRGenFunction::emitX86BuiltinExpr(unsigned builtinID,
                                                const CallExpr *expr) {
   if (builtinID == Builtin::BI__builtin_cpu_is) {
@@ -727,14 +757,40 @@ mlir::Value CIRGenFunction::emitX86BuiltinExpr(unsigned builtinID,
   case X86::BI__builtin_ia32_vpcomuw:
   case X86::BI__builtin_ia32_vpcomud:
   case X86::BI__builtin_ia32_vpcomuq:
+    cgm.errorNYI(expr->getSourceRange(),
+                 std::string("unimplemented X86 builtin call: ") +
+                     getContext().BuiltinInfo.getName(builtinID));
+    return {};
   case X86::BI__builtin_ia32_kortestcqi:
   case X86::BI__builtin_ia32_kortestchi:
   case X86::BI__builtin_ia32_kortestcsi:
-  case X86::BI__builtin_ia32_kortestcdi:
+  case X86::BI__builtin_ia32_kortestcdi: {
+    mlir::Location loc = getLoc(expr->getExprLoc());
+    cir::IntType ty = cast<cir::IntType>(ops[0].getType());
+    cir::IntAttr allOnesAttr =
+        cir::IntAttr::get(ty, APInt::getAllOnes(ty.getWidth()));
+    cir::ConstantOp allOnesOp = builder.getConstant(loc, allOnesAttr);
+    mlir::Value orOp = emitX86MaskLogic(builder, loc, cir::BinOpKind::Or, ops);
+    mlir::Value cmp =
+        cir::CmpOp::create(builder, loc, cir::CmpOpKind::eq, orOp, allOnesOp);
+    return builder.createCast(cir::CastKind::bool_to_int, cmp,
+                              cgm.convertType(expr->getType()));
+  }
   case X86::BI__builtin_ia32_kortestzqi:
   case X86::BI__builtin_ia32_kortestzhi:
   case X86::BI__builtin_ia32_kortestzsi:
-  case X86::BI__builtin_ia32_kortestzdi:
+  case X86::BI__builtin_ia32_kortestzdi: {
+    mlir::Location loc = getLoc(expr->getExprLoc());
+    cir::IntType ty = cast<cir::IntType>(ops[0].getType());
+    cir::IntAttr allZerosAttr =
+        cir::IntAttr::get(ty, APInt::getZero(ty.getWidth()));
+    cir::ConstantOp allZerosOp = builder.getConstant(loc, allZerosAttr);
+    mlir::Value orOp = emitX86MaskLogic(builder, loc, cir::BinOpKind::Or, ops);
+    mlir::Value cmp =
+        cir::CmpOp::create(builder, loc, cir::CmpOpKind::eq, orOp, allZerosOp);
+    return builder.createCast(cir::CastKind::bool_to_int, cmp,
+                              cgm.convertType(expr->getType()));
+  }
   case X86::BI__builtin_ia32_ktestcqi:
   case X86::BI__builtin_ia32_ktestzqi:
   case X86::BI__builtin_ia32_ktestchi:
@@ -744,37 +800,70 @@ mlir::Value CIRGenFunction::emitX86BuiltinExpr(unsigned builtinID,
   case X86::BI__builtin_ia32_ktestcdi:
   case X86::BI__builtin_ia32_ktestzdi:
   case X86::BI__builtin_ia32_kaddqi:
+    return emitX86MaskAddLogic(builder, getLoc(expr->getExprLoc()),
+                               "x86.avx512.kadd.b", ops);
   case X86::BI__builtin_ia32_kaddhi:
+    return emitX86MaskAddLogic(builder, getLoc(expr->getExprLoc()),
+                               "x86.avx512.kadd.w", ops);
   case X86::BI__builtin_ia32_kaddsi:
+    return emitX86MaskAddLogic(builder, getLoc(expr->getExprLoc()),
+                               "x86.avx512.kadd.d", ops);
   case X86::BI__builtin_ia32_kadddi:
+    return emitX86MaskAddLogic(builder, getLoc(expr->getExprLoc()),
+                               "x86.avx512.kadd.q", ops);
   case X86::BI__builtin_ia32_kandqi:
   case X86::BI__builtin_ia32_kandhi:
   case X86::BI__builtin_ia32_kandsi:
   case X86::BI__builtin_ia32_kanddi:
+    return emitX86MaskLogic(builder, getLoc(expr->getExprLoc()),
+                            cir::BinOpKind::And, ops);
   case X86::BI__builtin_ia32_kandnqi:
   case X86::BI__builtin_ia32_kandnhi:
   case X86::BI__builtin_ia32_kandnsi:
   case X86::BI__builtin_ia32_kandndi:
+    return emitX86MaskLogic(builder, getLoc(expr->getExprLoc()),
+                            cir::BinOpKind::And, ops, true);
   case X86::BI__builtin_ia32_korqi:
   case X86::BI__builtin_ia32_korhi:
   case X86::BI__builtin_ia32_korsi:
   case X86::BI__builtin_ia32_kordi:
+    return emitX86MaskLogic(builder, getLoc(expr->getExprLoc()),
+                            cir::BinOpKind::Or, ops);
   case X86::BI__builtin_ia32_kxnorqi:
   case X86::BI__builtin_ia32_kxnorhi:
   case X86::BI__builtin_ia32_kxnorsi:
   case X86::BI__builtin_ia32_kxnordi:
+    return emitX86MaskLogic(builder, getLoc(expr->getExprLoc()),
+                            cir::BinOpKind::Xor, ops, true);
   case X86::BI__builtin_ia32_kxorqi:
   case X86::BI__builtin_ia32_kxorhi:
   case X86::BI__builtin_ia32_kxorsi:
   case X86::BI__builtin_ia32_kxordi:
+    return emitX86MaskLogic(builder, getLoc(expr->getExprLoc()),
+                            cir::BinOpKind::Xor, ops);
   case X86::BI__builtin_ia32_knotqi:
   case X86::BI__builtin_ia32_knothi:
   case X86::BI__builtin_ia32_knotsi:
-  case X86::BI__builtin_ia32_knotdi:
+  case X86::BI__builtin_ia32_knotdi: {
+    cir::IntType intTy = cast<cir::IntType>(ops[0].getType());
+    unsigned numElts = intTy.getWidth();
+    mlir::Value resVec =
+        getMaskVecValue(builder, getLoc(expr->getExprLoc()), ops[0], numElts);
+    return builder.createBitcast(builder.createNot(resVec), ops[0].getType());
+  }
   case X86::BI__builtin_ia32_kmovb:
   case X86::BI__builtin_ia32_kmovw:
   case X86::BI__builtin_ia32_kmovd:
-  case X86::BI__builtin_ia32_kmovq:
+  case X86::BI__builtin_ia32_kmovq: {
+    // Bitcast to vXi1 type and then back to integer. This gets the mask
+    // register type into the IR, but might be optimized out depending on
+    // what's around it.
+    cir::IntType intTy = cast<cir::IntType>(ops[0].getType());
+    unsigned numElts = intTy.getWidth();
+    mlir::Value resVec =
+        getMaskVecValue(builder, getLoc(expr->getExprLoc()), ops[0], numElts);
+    return builder.createBitcast(resVec, ops[0].getType());
+  }
   case X86::BI__builtin_ia32_kunpckdi:
   case X86::BI__builtin_ia32_kunpcksi:
   case X86::BI__builtin_ia32_kunpckhi:
diff --git a/clang/test/CIR/CodeGenBuiltins/X86/avx512bw-builtins.c b/clang/test/CIR/CodeGenBuiltins/X86/avx512bw-builtins.c
index 3522e2c7e50bf..9f0ca5874c589 100644
--- a/clang/test/CIR/CodeGenBuiltins/X86/avx512bw-builtins.c
+++ b/clang/test/CIR/CodeGenBuiltins/X86/avx512bw-builtins.c
@@ -115,3 +115,477 @@ __mmask32 test_kshiftri_mask32_out_of_range(__mmask32 A) {
 
   return _kshiftri_mask32(A, 33);
 }
+
+
+__mmask32 test_kadd_mask32(__mmask32 A, __mmask32 B) {
+  // CIR-LABEL: _kadd_mask32
+  // CIR: cir.cast bitcast {{.*}} : !u32i -> !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !u32i -> !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.call_llvm_intrinsic "x86.avx512.kadd.d"
+  // CIR: cir.cast bitcast {{.*}} : !cir.vector<32 x !cir.int<u, 1>> -> !u32i
+
+  // LLVM-LABEL: _kadd_mask32
+  // LLVM: [[L:%.*]] = bitcast i32 %{{.*}} to <32 x i1>
+  // LLVM: [[R:%.*]] = bitcast i32 %{{.*}} to <32 x i1>
+  // LLVM: [[RES:%.*]] = call <32 x i1> @llvm.x86.avx512.kadd.d(<32 x i1> [[L]], <32 x i1> [[R]])
+  // LLVM: bitcast <32 x i1> [[RES]] to i32
+
+  // OGCG-LABEL: _kadd_mask32
+  // OGCG: bitcast i32 %{{.*}} to <32 x i1>
+  // OGCG: bitcast i32 %{{.*}} to <32 x i1>
+  // OGCG: call <32 x i1> @llvm.x86.avx512.kadd.d
+  // OGCG: bitcast <32 x i1> {{.*}} to i32
+  return _kadd_mask32(A, B);
+}
+
+__mmask64 test_kadd_mask64(__mmask64 A, __mmask64 B) {
+  // CIR-LABEL: _kadd_mask64
+  // CIR: cir.cast bitcast {{.*}} : !u64i -> !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !u64i -> !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.call_llvm_intrinsic "x86.avx512.kadd.q"
+  // CIR: cir.cast bitcast {{.*}} : !cir.vector<64 x !cir.int<u, 1>> -> !u64i
+
+  // LLVM-LABEL: _kadd_mask64
+  // LLVM: [[L:%.*]] = bitcast i64 %{{.*}} to <64 x i1>
+  // LLVM: [[R:%.*]] = bitcast i64 %{{.*}} to <64 x i1>
+  // LLVM: [[RES:%.*]] = call <64 x i1> @llvm.x86.avx512.kadd.q(<64 x i1> [[L]], <64 x i1> [[R]])
+  // LLVM: bitcast <64 x i1> [[RES]] to i64
+
+  // OGCG-LABEL: _kadd_mask64
+  // OGCG: bitcast i64 %{{.*}} to <64 x i1>
+  // OGCG: bitcast i64 %{{.*}} to <64 x i1>
+  // OGCG: call <64 x i1> @llvm.x86.avx512.kadd.q
+  // OGCG: bitcast <64 x i1> {{.*}} to i64
+  return _kadd_mask64(A, B);
+}
+
+__mmask32 test_kand_mask32(__mmask32 A, __mmask32 B) {
+  // CIR-LABEL: _kand_mask32
+  // CIR: cir.cast bitcast {{.*}} : !u32i -> !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !u32i -> !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.binop(and, {{.*}}, {{.*}}) : !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !cir.vector<32 x !cir.int<u, 1>> -> !u32i
+
+  // LLVM-LABEL: _kand_mask32
+  // LLVM: [[L:%.*]] = bitcast i32 %{{.*}} to <32 x i1>
+  // LLVM: [[R:%.*]] = bitcast i32 %{{.*}} to <32 x i1>
+  // LLVM: [[RES:%.*]] = and <32 x i1> [[L]], [[R]]
+  // LLVM: bitcast <32 x i1> [[RES]] to i32
+
+  // OGCG-LABEL: _kand_mask32
+  // OGCG: bitcast i32 %{{.*}} to <32 x i1>
+  // OGCG: bitcast i32 %{{.*}} to <32 x i1>
+  // OGCG: and <32 x i1>
+  // OGCG: bitcast <32 x i1> {{.*}} to i32
+  return _kand_mask32(A, B);
+}
+
+__mmask64 test_kand_mask64(__mmask64 A, __mmask64 B) {
+  // CIR-LABEL: _kand_mask64
+  // CIR: cir.cast bitcast {{.*}} : !u64i -> !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !u64i -> !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.binop(and, {{.*}}, {{.*}}) : !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !cir.vector<64 x !cir.int<u, 1>> -> !u64i
+
+  // LLVM-LABEL: _kand_mask64
+  // LLVM: [[L:%.*]] = bitcast i64 %{{.*}} to <64 x i1>
+  // LLVM: [[R:%.*]] = bitcast i64 %{{.*}} to <64 x i1>
+  // LLVM: [[RES:%.*]] = and <64 x i1> [[L]], [[R]]
+  // LLVM: bitcast <64 x i1> [[RES]] to i64
+
+  // OGCG-LABEL: _kand_mask64
+  // OGCG: bitcast i64 %{{.*}} to <64 x i1>
+  // OGCG: bitcast i64 %{{.*}} to <64 x i1>
+  // OGCG: and <64 x i1>
+  // OGCG: bitcast <64 x i1> {{.*}} to i64
+  return _kand_mask64(A, B);
+}
+
+__mmask32 test_kandn_mask32(__mmask32 A, __mmask32 B) {
+  // CIR-LABEL: _kandn_mask32
+  // CIR: cir.cast bitcast {{.*}} : !u32i -> !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !u32i -> !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.unary(not, {{.*}}) : !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.binop(and, {{.*}}, {{.*}}) : !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !cir.vector<32 x !cir.int<u, 1>> -> !u32i
+
+  // LLVM-LABEL: _kandn_mask32
+  // LLVM: [[L:%.*]] = bitcast i32 %{{.*}} to <32 x i1>
+  // LLVM: [[R:%.*]] = bitcast i32 %{{.*}} to <32 x i1>
+  // LLVM: xor <32 x i1> [[L]], splat (i1 true)
+  // LLVM: and <32 x i1>
+  // LLVM: bitcast <32 x i1> {{.*}} to i32
+
+  // OGCG-LABEL: _kandn_mask32
+  // OGCG: bitcast i32 %{{.*}} to <32 x i1>
+  // OGCG: bitcast i32 %{{.*}} to <32 x i1>
+  // OGCG: xor <32 x i1>
+  // OGCG: and <32 x i1>
+  // OGCG: bitcast <32 x i1> {{.*}} to i32
+  return _kandn_mask32(A, B);
+}
+
+__mmask64 test_kandn_mask64(__mmask64 A, __mmask64 B) {
+  // CIR-LABEL: _kandn_mask64
+  // CIR: cir.cast bitcast {{.*}} : !u64i -> !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !u64i -> !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.unary(not, {{.*}}) : !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.binop(and, {{.*}}, {{.*}}) : !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !cir.vector<64 x !cir.int<u, 1>> -> !u64i
+
+  // LLVM-LABEL: _kandn_mask64
+  // LLVM: [[L:%.*]] = bitcast i64 %{{.*}} to <64 x i1>
+  // LLVM: [[R:%.*]] = bitcast i64 %{{.*}} to <64 x i1>
+  // LLVM: xor <64 x i1> [[L]], splat (i1 true)
+  // LLVM: and <64 x i1>
+  // LLVM: bitcast <64 x i1> {{.*}} to i64
+
+  // OGCG-LABEL: _kandn_mask64
+  // OGCG: bitcast i64 %{{.*}} to <64 x i1>
+  // OGCG: bitcast i64 %{{.*}} to <64 x i1>
+  // OGCG: xor <64 x i1>
+  // OGCG: and <64 x i1>
+  // OGCG: bitcast <64 x i1> {{.*}} to i64
+  return _kandn_mask64(A, B);
+}
+
+__mmask32 test_kor_mask32(__mmask32 A, __mmask32 B) {
+  // CIR-LABEL: _kor_mask32
+  // CIR: cir.cast bitcast {{.*}} : !u32i -> !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !u32i -> !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.binop(or, {{.*}}, {{.*}}) : !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !cir.vector<32 x !cir.int<u, 1>> -> !u32i
+
+  // LLVM-LABEL: _kor_mask32
+  // LLVM: [[L:%.*]] = bitcast i32 %{{.*}} to <32 x i1>
+  // LLVM: [[R:%.*]] = bitcast i32 %{{.*}} to <32 x i1>
+  // LLVM: or <32 x i1> [[L]], [[R]]
+  // LLVM: bitcast <32 x i1> {{.*}} to i32
+
+  // OGCG-LABEL: _kor_mask32
+  // OGCG: bitcast i32 %{{.*}} to <32 x i1>
+  // OGCG: bitcast i32 %{{.*}} to <32 x i1>
+  // OGCG: or <32 x i1>
+  // OGCG: bitcast <32 x i1> {{.*}} to i32
+  return _kor_mask32(A, B);
+}
+
+__mmask64 test_kor_mask64(__mmask64 A, __mmask64 B) {
+  // CIR-LABEL: _kor_mask64
+  // CIR: cir.cast bitcast {{.*}} : !u64i -> !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !u64i -> !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.binop(or, {{.*}}, {{.*}}) : !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !cir.vector<64 x !cir.int<u, 1>> -> !u64i
+
+  // LLVM-LABEL: _kor_mask64
+  // LLVM: [[L:%.*]] = bitcast i64 %{{.*}} to <64 x i1>
+  // LLVM: [[R:%.*]] = bitcast i64 %{{.*}} to <64 x i1>
+  // LLVM: or <64 x i1> [[L]], [[R]]
+  // LLVM: bitcast <64 x i1> {{.*}} to i64
+
+  // OGCG-LABEL: _kor_mask64
+  // OGCG: bitcast i64 %{{.*}} to <64 x i1>
+  // OGCG: bitcast i64 %{{.*}} to <64 x i1>
+  // OGCG: or <64 x i1>
+  // OGCG: bitcast <64 x i1> {{.*}} to i64
+  return _kor_mask64(A, B);
+}
+
+__mmask32 test_kxor_mask32(__mmask32 A, __mmask32 B) {
+  // CIR-LABEL: _kxor_mask32
+  // CIR: cir.cast bitcast {{.*}} : !u32i -> !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !u32i -> !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.binop(xor, {{.*}}, {{.*}}) : !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !cir.vector<32 x !cir.int<u, 1>> -> !u32i
+
+  // LLVM-LABEL: _kxor_mask32
+  // LLVM: [[L:%.*]] = bitcast i32 %{{.*}} to <32 x i1>
+  // LLVM: [[R:%.*]] = bitcast i32 %{{.*}} to <32 x i1>
+  // LLVM: xor <32 x i1> [[L]], [[R]]
+  // LLVM: bitcast <32 x i1> {{.*}} to i32
+
+  // OGCG-LABEL: _kxor_mask32
+  // OGCG: bitcast i32 %{{.*}} to <32 x i1>
+  // OGCG: bitcast i32 %{{.*}} to <32 x i1>
+  // OGCG: xor <32 x i1>
+  // OGCG: bitcast <32 x i1> {{.*}} to i32
+  return _kxor_mask32(A, B);
+}
+
+__mmask64 test_kxor_mask64(__mmask64 A, __mmask64 B) {
+  // CIR-LABEL: _kxor_mask64
+  // CIR: cir.cast bitcast {{.*}} : !u64i -> !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !u64i -> !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.binop(xor, {{.*}}, {{.*}}) : !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !cir.vector<64 x !cir.int<u, 1>> -> !u64i
+
+  // LLVM-LABEL: _kxor_mask64
+  // LLVM: [[L:%.*]] = bitcast i64 %{{.*}} to <64 x i1>
+  // LLVM: [[R:%.*]] = bitcast i64 %{{.*}} to <64 x i1>
+  // LLVM: xor <64 x i1> [[L]], [[R]]
+  // LLVM: bitcast <64 x i1> {{.*}} to i64
+
+  // OGCG-LABEL: _kxor_mask64
+  // OGCG: bitcast i64 %{{.*}} to <64 x i1>
+  // OGCG: bitcast i64 %{{.*}} to <64 x i1>
+  // OGCG: xor <64 x i1>
+  // OGCG: bitcast <64 x i1> {{.*}} to i64
+  return _kxor_mask64(A, B);
+}
+
+__mmask32 test_kxnor_mask32(__mmask32 A, __mmask32 B) {
+  // CIR-LABEL: _kxnor_mask32
+  // CIR: cir.cast bitcast {{.*}} : !u32i -> !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !u32i -> !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.unary(not, {{.*}}) : !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.binop(xor, {{.*}}, {{.*}}) : !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !cir.vector<32 x !cir.int<u, 1>> -> !u32i
+
+  // LLVM-LABEL: _kxnor_mask32
+  // LLVM: [[L:%.*]] = bitcast i32 %{{.*}} to <32 x i1>
+  // LLVM: [[R:%.*]] = bitcast i32 %{{.*}} to <32 x i1>
+  // LLVM: [[NOT:%.*]] = xor <32 x i1> [[L]], splat (i1 true)
+  // LLVM: [[RES:%.*]] = xor <32 x i1> [[NOT]], [[R]]
+  // LLVM: bitcast <32 x i1> [[RES]] to i32
+
+  // OGCG-LABEL: _kxnor_mask32
+  // OGCG: bitcast i32 %{{.*}} to <32 x i1>
+  // OGCG: bitcast i32 %{{.*}} to <32 x i1>
+  // OGCG: xor <32 x i1>
+  // OGCG: xor <32 x i1>
+  // OGCG: bitcast <32 x i1> {{.*}} to i32
+
+  return _kxnor_mask32(A, B);
+}
+
+__mmask64 test_kxnor_mask64(__mmask64 A, __mmask64 B) {
+  // CIR-LABEL: _kxnor_mask64
+  // CIR: cir.cast bitcast {{.*}} : !u64i -> !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !u64i -> !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.unary(not, {{.*}}) : !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.binop(xor, {{.*}}, {{.*}}) : !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !cir.vector<64 x !cir.int<u, 1>> -> !u64i
+
+  // LLVM-LABEL: _kxnor_mask64
+  // LLVM: [[L:%.*]] = bitcast i64 %{{.*}} to <64 x i1>
+  // LLVM: [[R:%.*]] = bitcast i64 %{{.*}} to <64 x i1>
+  // LLVM: [[NOT:%.*]] = xor <64 x i1> [[L]], splat (i1 true)
+  // LLVM: [[RES:%.*]] = xor <64 x i1> [[NOT]], [[R]]
+  // LLVM: bitcast <64 x i1> [[RES]] to i64
+
+  // OGCG-LABEL: _kxnor_mask64
+  // OGCG: bitcast i64 %{{.*}} to <64 x i1>
+  // OGCG: bitcast i64 %{{.*}} to <64 x i1>
+  // OGCG: xor <64 x i1>
+  // OGCG: xor <64 x i1>
+  // OGCG: bitcast <64 x i1> {{.*}} to i64
+
+  return _kxnor_mask64(A, B);
+}
+
+
+__mmask32 test_knot_mask32(__mmask32 A) {
+  // CIR-LABEL: _knot_mask32
+  // CIR: cir.cast bitcast {{.*}} : !u32i -> !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.unary(not, {{.*}}) : !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !cir.vector<32 x !cir.int<u, 1>> -> !u32i
+
+  // LLVM-LABEL: _knot_mask32
+  // LLVM: bitcast i32 %{{.*}} to <32 x i1>
+  // LLVM: xor <32 x i1>
+  // LLVM: bitcast <32 x i1> {{.*}} to i32
+
+  // OGCG-LABEL: _knot_mask32
+  // OGCG: bitcast i32 %{{.*}} to <32 x i1>
+  // OGCG: xor <32 x i1>
+  // OGCG: bitcast <32 x i1> {{.*}} to i32
+  return _knot_mask32(A);
+}
+
+__mmask64 test_knot_mask64(__mmask64 A) {
+  // CIR-LABEL: _knot_mask64
+  // CIR: cir.cast bitcast {{.*}} : !u64i -> !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.unary(not, {{.*}}) : !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !cir.vector<64 x !cir.int<u, 1>> -> !u64i
+
+  // LLVM-LABEL: _knot_mask64
+  // LLVM: bitcast i64 %{{.*}} to <64 x i1>
+  // LLVM: xor <64 x i1>
+  // LLVM: bitcast <64 x i1> {{.*}} to i64
+
+  // OGCG-LABEL: _knot_mask64
+  // OGCG: bitcast i64 %{{.*}} to <64 x i1>
+  // OGCG: xor <64 x i1>
+  // OGCG: bitcast <64 x i1> {{.*}} to i64
+  return _knot_mask64(A);
+}
+
+// Multiple user-level mask helpers inline to this same kmov...
[truncated]

llvmbot · 2025-11-29T10:24:23Z

@llvm/pr-subscribers-clangir

Author: AIT (GeneraluseAI)

Changes

part of #167765
This is a follow-up patch to continue the work from <#169774>.
It builds on the previous lowering infrastructure and extends support to the AVX512 ktest builtins.

Patch is 48.58 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/169985.diff

4 Files Affected:

(modified) clang/lib/CIR/CodeGen/CIRGenBuiltinX86.cpp (+93-4)
(modified) clang/test/CIR/CodeGenBuiltins/X86/avx512bw-builtins.c (+474)
(added) clang/test/CIR/CodeGenBuiltins/X86/avx512dq-builtins.c (+275)
(modified) clang/test/CIR/CodeGenBuiltins/X86/avx512f-builtins.c (+207)

diff --git a/clang/lib/CIR/CodeGen/CIRGenBuiltinX86.cpp b/clang/lib/CIR/CodeGen/CIRGenBuiltinX86.cpp
index 0e43345bad6f1..1b6dd54d32646 100644
--- a/clang/lib/CIR/CodeGen/CIRGenBuiltinX86.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenBuiltinX86.cpp
@@ -85,6 +85,36 @@ static mlir::Value getMaskVecValue(CIRGenBuilderTy &builder, mlir::Location loc,
   return maskVec;
 }
 
+static mlir::Value emitX86MaskAddLogic(CIRGenBuilderTy &builder,
+                                       mlir::Location loc,
+                                       const std::string &intrinsicName,
+                                       SmallVectorImpl<mlir::Value> &ops) {
+
+  auto intTy = cast<cir::IntType>(ops[0].getType());
+  unsigned numElts = intTy.getWidth();
+  mlir::Value lhsVec = getMaskVecValue(builder, loc, ops[0], numElts);
+  mlir::Value rhsVec = getMaskVecValue(builder, loc, ops[1], numElts);
+  mlir::Type vecTy = lhsVec.getType();
+  mlir::Value resVec = emitIntrinsicCallOp(builder, loc, intrinsicName, vecTy,
+                                           mlir::ValueRange{lhsVec, rhsVec});
+  return builder.createBitcast(resVec, ops[0].getType());
+}
+
+static mlir::Value emitX86MaskLogic(CIRGenBuilderTy &builder,
+                                    mlir::Location loc,
+                                    cir::BinOpKind binOpKind,
+                                    SmallVectorImpl<mlir::Value> &ops,
+                                    bool invertLHS = false) {
+  unsigned numElts = cast<cir::IntType>(ops[0].getType()).getWidth();
+  mlir::Value lhs = getMaskVecValue(builder, loc, ops[0], numElts);
+  mlir::Value rhs = getMaskVecValue(builder, loc, ops[1], numElts);
+
+  if (invertLHS)
+    lhs = builder.createNot(lhs);
+  return builder.createBitcast(builder.createBinop(loc, lhs, binOpKind, rhs),
+                               ops[0].getType());
+}
+
 mlir::Value CIRGenFunction::emitX86BuiltinExpr(unsigned builtinID,
                                                const CallExpr *expr) {
   if (builtinID == Builtin::BI__builtin_cpu_is) {
@@ -727,14 +757,40 @@ mlir::Value CIRGenFunction::emitX86BuiltinExpr(unsigned builtinID,
   case X86::BI__builtin_ia32_vpcomuw:
   case X86::BI__builtin_ia32_vpcomud:
   case X86::BI__builtin_ia32_vpcomuq:
+    cgm.errorNYI(expr->getSourceRange(),
+                 std::string("unimplemented X86 builtin call: ") +
+                     getContext().BuiltinInfo.getName(builtinID));
+    return {};
   case X86::BI__builtin_ia32_kortestcqi:
   case X86::BI__builtin_ia32_kortestchi:
   case X86::BI__builtin_ia32_kortestcsi:
-  case X86::BI__builtin_ia32_kortestcdi:
+  case X86::BI__builtin_ia32_kortestcdi: {
+    mlir::Location loc = getLoc(expr->getExprLoc());
+    cir::IntType ty = cast<cir::IntType>(ops[0].getType());
+    cir::IntAttr allOnesAttr =
+        cir::IntAttr::get(ty, APInt::getAllOnes(ty.getWidth()));
+    cir::ConstantOp allOnesOp = builder.getConstant(loc, allOnesAttr);
+    mlir::Value orOp = emitX86MaskLogic(builder, loc, cir::BinOpKind::Or, ops);
+    mlir::Value cmp =
+        cir::CmpOp::create(builder, loc, cir::CmpOpKind::eq, orOp, allOnesOp);
+    return builder.createCast(cir::CastKind::bool_to_int, cmp,
+                              cgm.convertType(expr->getType()));
+  }
   case X86::BI__builtin_ia32_kortestzqi:
   case X86::BI__builtin_ia32_kortestzhi:
   case X86::BI__builtin_ia32_kortestzsi:
-  case X86::BI__builtin_ia32_kortestzdi:
+  case X86::BI__builtin_ia32_kortestzdi: {
+    mlir::Location loc = getLoc(expr->getExprLoc());
+    cir::IntType ty = cast<cir::IntType>(ops[0].getType());
+    cir::IntAttr allZerosAttr =
+        cir::IntAttr::get(ty, APInt::getZero(ty.getWidth()));
+    cir::ConstantOp allZerosOp = builder.getConstant(loc, allZerosAttr);
+    mlir::Value orOp = emitX86MaskLogic(builder, loc, cir::BinOpKind::Or, ops);
+    mlir::Value cmp =
+        cir::CmpOp::create(builder, loc, cir::CmpOpKind::eq, orOp, allZerosOp);
+    return builder.createCast(cir::CastKind::bool_to_int, cmp,
+                              cgm.convertType(expr->getType()));
+  }
   case X86::BI__builtin_ia32_ktestcqi:
   case X86::BI__builtin_ia32_ktestzqi:
   case X86::BI__builtin_ia32_ktestchi:
@@ -744,37 +800,70 @@ mlir::Value CIRGenFunction::emitX86BuiltinExpr(unsigned builtinID,
   case X86::BI__builtin_ia32_ktestcdi:
   case X86::BI__builtin_ia32_ktestzdi:
   case X86::BI__builtin_ia32_kaddqi:
+    return emitX86MaskAddLogic(builder, getLoc(expr->getExprLoc()),
+                               "x86.avx512.kadd.b", ops);
   case X86::BI__builtin_ia32_kaddhi:
+    return emitX86MaskAddLogic(builder, getLoc(expr->getExprLoc()),
+                               "x86.avx512.kadd.w", ops);
   case X86::BI__builtin_ia32_kaddsi:
+    return emitX86MaskAddLogic(builder, getLoc(expr->getExprLoc()),
+                               "x86.avx512.kadd.d", ops);
   case X86::BI__builtin_ia32_kadddi:
+    return emitX86MaskAddLogic(builder, getLoc(expr->getExprLoc()),
+                               "x86.avx512.kadd.q", ops);
   case X86::BI__builtin_ia32_kandqi:
   case X86::BI__builtin_ia32_kandhi:
   case X86::BI__builtin_ia32_kandsi:
   case X86::BI__builtin_ia32_kanddi:
+    return emitX86MaskLogic(builder, getLoc(expr->getExprLoc()),
+                            cir::BinOpKind::And, ops);
   case X86::BI__builtin_ia32_kandnqi:
   case X86::BI__builtin_ia32_kandnhi:
   case X86::BI__builtin_ia32_kandnsi:
   case X86::BI__builtin_ia32_kandndi:
+    return emitX86MaskLogic(builder, getLoc(expr->getExprLoc()),
+                            cir::BinOpKind::And, ops, true);
   case X86::BI__builtin_ia32_korqi:
   case X86::BI__builtin_ia32_korhi:
   case X86::BI__builtin_ia32_korsi:
   case X86::BI__builtin_ia32_kordi:
+    return emitX86MaskLogic(builder, getLoc(expr->getExprLoc()),
+                            cir::BinOpKind::Or, ops);
   case X86::BI__builtin_ia32_kxnorqi:
   case X86::BI__builtin_ia32_kxnorhi:
   case X86::BI__builtin_ia32_kxnorsi:
   case X86::BI__builtin_ia32_kxnordi:
+    return emitX86MaskLogic(builder, getLoc(expr->getExprLoc()),
+                            cir::BinOpKind::Xor, ops, true);
   case X86::BI__builtin_ia32_kxorqi:
   case X86::BI__builtin_ia32_kxorhi:
   case X86::BI__builtin_ia32_kxorsi:
   case X86::BI__builtin_ia32_kxordi:
+    return emitX86MaskLogic(builder, getLoc(expr->getExprLoc()),
+                            cir::BinOpKind::Xor, ops);
   case X86::BI__builtin_ia32_knotqi:
   case X86::BI__builtin_ia32_knothi:
   case X86::BI__builtin_ia32_knotsi:
-  case X86::BI__builtin_ia32_knotdi:
+  case X86::BI__builtin_ia32_knotdi: {
+    cir::IntType intTy = cast<cir::IntType>(ops[0].getType());
+    unsigned numElts = intTy.getWidth();
+    mlir::Value resVec =
+        getMaskVecValue(builder, getLoc(expr->getExprLoc()), ops[0], numElts);
+    return builder.createBitcast(builder.createNot(resVec), ops[0].getType());
+  }
   case X86::BI__builtin_ia32_kmovb:
   case X86::BI__builtin_ia32_kmovw:
   case X86::BI__builtin_ia32_kmovd:
-  case X86::BI__builtin_ia32_kmovq:
+  case X86::BI__builtin_ia32_kmovq: {
+    // Bitcast to vXi1 type and then back to integer. This gets the mask
+    // register type into the IR, but might be optimized out depending on
+    // what's around it.
+    cir::IntType intTy = cast<cir::IntType>(ops[0].getType());
+    unsigned numElts = intTy.getWidth();
+    mlir::Value resVec =
+        getMaskVecValue(builder, getLoc(expr->getExprLoc()), ops[0], numElts);
+    return builder.createBitcast(resVec, ops[0].getType());
+  }
   case X86::BI__builtin_ia32_kunpckdi:
   case X86::BI__builtin_ia32_kunpcksi:
   case X86::BI__builtin_ia32_kunpckhi:
diff --git a/clang/test/CIR/CodeGenBuiltins/X86/avx512bw-builtins.c b/clang/test/CIR/CodeGenBuiltins/X86/avx512bw-builtins.c
index 3522e2c7e50bf..9f0ca5874c589 100644
--- a/clang/test/CIR/CodeGenBuiltins/X86/avx512bw-builtins.c
+++ b/clang/test/CIR/CodeGenBuiltins/X86/avx512bw-builtins.c
@@ -115,3 +115,477 @@ __mmask32 test_kshiftri_mask32_out_of_range(__mmask32 A) {
 
   return _kshiftri_mask32(A, 33);
 }
+
+
+__mmask32 test_kadd_mask32(__mmask32 A, __mmask32 B) {
+  // CIR-LABEL: _kadd_mask32
+  // CIR: cir.cast bitcast {{.*}} : !u32i -> !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !u32i -> !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.call_llvm_intrinsic "x86.avx512.kadd.d"
+  // CIR: cir.cast bitcast {{.*}} : !cir.vector<32 x !cir.int<u, 1>> -> !u32i
+
+  // LLVM-LABEL: _kadd_mask32
+  // LLVM: [[L:%.*]] = bitcast i32 %{{.*}} to <32 x i1>
+  // LLVM: [[R:%.*]] = bitcast i32 %{{.*}} to <32 x i1>
+  // LLVM: [[RES:%.*]] = call <32 x i1> @llvm.x86.avx512.kadd.d(<32 x i1> [[L]], <32 x i1> [[R]])
+  // LLVM: bitcast <32 x i1> [[RES]] to i32
+
+  // OGCG-LABEL: _kadd_mask32
+  // OGCG: bitcast i32 %{{.*}} to <32 x i1>
+  // OGCG: bitcast i32 %{{.*}} to <32 x i1>
+  // OGCG: call <32 x i1> @llvm.x86.avx512.kadd.d
+  // OGCG: bitcast <32 x i1> {{.*}} to i32
+  return _kadd_mask32(A, B);
+}
+
+__mmask64 test_kadd_mask64(__mmask64 A, __mmask64 B) {
+  // CIR-LABEL: _kadd_mask64
+  // CIR: cir.cast bitcast {{.*}} : !u64i -> !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !u64i -> !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.call_llvm_intrinsic "x86.avx512.kadd.q"
+  // CIR: cir.cast bitcast {{.*}} : !cir.vector<64 x !cir.int<u, 1>> -> !u64i
+
+  // LLVM-LABEL: _kadd_mask64
+  // LLVM: [[L:%.*]] = bitcast i64 %{{.*}} to <64 x i1>
+  // LLVM: [[R:%.*]] = bitcast i64 %{{.*}} to <64 x i1>
+  // LLVM: [[RES:%.*]] = call <64 x i1> @llvm.x86.avx512.kadd.q(<64 x i1> [[L]], <64 x i1> [[R]])
+  // LLVM: bitcast <64 x i1> [[RES]] to i64
+
+  // OGCG-LABEL: _kadd_mask64
+  // OGCG: bitcast i64 %{{.*}} to <64 x i1>
+  // OGCG: bitcast i64 %{{.*}} to <64 x i1>
+  // OGCG: call <64 x i1> @llvm.x86.avx512.kadd.q
+  // OGCG: bitcast <64 x i1> {{.*}} to i64
+  return _kadd_mask64(A, B);
+}
+
+__mmask32 test_kand_mask32(__mmask32 A, __mmask32 B) {
+  // CIR-LABEL: _kand_mask32
+  // CIR: cir.cast bitcast {{.*}} : !u32i -> !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !u32i -> !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.binop(and, {{.*}}, {{.*}}) : !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !cir.vector<32 x !cir.int<u, 1>> -> !u32i
+
+  // LLVM-LABEL: _kand_mask32
+  // LLVM: [[L:%.*]] = bitcast i32 %{{.*}} to <32 x i1>
+  // LLVM: [[R:%.*]] = bitcast i32 %{{.*}} to <32 x i1>
+  // LLVM: [[RES:%.*]] = and <32 x i1> [[L]], [[R]]
+  // LLVM: bitcast <32 x i1> [[RES]] to i32
+
+  // OGCG-LABEL: _kand_mask32
+  // OGCG: bitcast i32 %{{.*}} to <32 x i1>
+  // OGCG: bitcast i32 %{{.*}} to <32 x i1>
+  // OGCG: and <32 x i1>
+  // OGCG: bitcast <32 x i1> {{.*}} to i32
+  return _kand_mask32(A, B);
+}
+
+__mmask64 test_kand_mask64(__mmask64 A, __mmask64 B) {
+  // CIR-LABEL: _kand_mask64
+  // CIR: cir.cast bitcast {{.*}} : !u64i -> !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !u64i -> !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.binop(and, {{.*}}, {{.*}}) : !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !cir.vector<64 x !cir.int<u, 1>> -> !u64i
+
+  // LLVM-LABEL: _kand_mask64
+  // LLVM: [[L:%.*]] = bitcast i64 %{{.*}} to <64 x i1>
+  // LLVM: [[R:%.*]] = bitcast i64 %{{.*}} to <64 x i1>
+  // LLVM: [[RES:%.*]] = and <64 x i1> [[L]], [[R]]
+  // LLVM: bitcast <64 x i1> [[RES]] to i64
+
+  // OGCG-LABEL: _kand_mask64
+  // OGCG: bitcast i64 %{{.*}} to <64 x i1>
+  // OGCG: bitcast i64 %{{.*}} to <64 x i1>
+  // OGCG: and <64 x i1>
+  // OGCG: bitcast <64 x i1> {{.*}} to i64
+  return _kand_mask64(A, B);
+}
+
+__mmask32 test_kandn_mask32(__mmask32 A, __mmask32 B) {
+  // CIR-LABEL: _kandn_mask32
+  // CIR: cir.cast bitcast {{.*}} : !u32i -> !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !u32i -> !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.unary(not, {{.*}}) : !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.binop(and, {{.*}}, {{.*}}) : !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !cir.vector<32 x !cir.int<u, 1>> -> !u32i
+
+  // LLVM-LABEL: _kandn_mask32
+  // LLVM: [[L:%.*]] = bitcast i32 %{{.*}} to <32 x i1>
+  // LLVM: [[R:%.*]] = bitcast i32 %{{.*}} to <32 x i1>
+  // LLVM: xor <32 x i1> [[L]], splat (i1 true)
+  // LLVM: and <32 x i1>
+  // LLVM: bitcast <32 x i1> {{.*}} to i32
+
+  // OGCG-LABEL: _kandn_mask32
+  // OGCG: bitcast i32 %{{.*}} to <32 x i1>
+  // OGCG: bitcast i32 %{{.*}} to <32 x i1>
+  // OGCG: xor <32 x i1>
+  // OGCG: and <32 x i1>
+  // OGCG: bitcast <32 x i1> {{.*}} to i32
+  return _kandn_mask32(A, B);
+}
+
+__mmask64 test_kandn_mask64(__mmask64 A, __mmask64 B) {
+  // CIR-LABEL: _kandn_mask64
+  // CIR: cir.cast bitcast {{.*}} : !u64i -> !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !u64i -> !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.unary(not, {{.*}}) : !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.binop(and, {{.*}}, {{.*}}) : !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !cir.vector<64 x !cir.int<u, 1>> -> !u64i
+
+  // LLVM-LABEL: _kandn_mask64
+  // LLVM: [[L:%.*]] = bitcast i64 %{{.*}} to <64 x i1>
+  // LLVM: [[R:%.*]] = bitcast i64 %{{.*}} to <64 x i1>
+  // LLVM: xor <64 x i1> [[L]], splat (i1 true)
+  // LLVM: and <64 x i1>
+  // LLVM: bitcast <64 x i1> {{.*}} to i64
+
+  // OGCG-LABEL: _kandn_mask64
+  // OGCG: bitcast i64 %{{.*}} to <64 x i1>
+  // OGCG: bitcast i64 %{{.*}} to <64 x i1>
+  // OGCG: xor <64 x i1>
+  // OGCG: and <64 x i1>
+  // OGCG: bitcast <64 x i1> {{.*}} to i64
+  return _kandn_mask64(A, B);
+}
+
+__mmask32 test_kor_mask32(__mmask32 A, __mmask32 B) {
+  // CIR-LABEL: _kor_mask32
+  // CIR: cir.cast bitcast {{.*}} : !u32i -> !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !u32i -> !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.binop(or, {{.*}}, {{.*}}) : !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !cir.vector<32 x !cir.int<u, 1>> -> !u32i
+
+  // LLVM-LABEL: _kor_mask32
+  // LLVM: [[L:%.*]] = bitcast i32 %{{.*}} to <32 x i1>
+  // LLVM: [[R:%.*]] = bitcast i32 %{{.*}} to <32 x i1>
+  // LLVM: or <32 x i1> [[L]], [[R]]
+  // LLVM: bitcast <32 x i1> {{.*}} to i32
+
+  // OGCG-LABEL: _kor_mask32
+  // OGCG: bitcast i32 %{{.*}} to <32 x i1>
+  // OGCG: bitcast i32 %{{.*}} to <32 x i1>
+  // OGCG: or <32 x i1>
+  // OGCG: bitcast <32 x i1> {{.*}} to i32
+  return _kor_mask32(A, B);
+}
+
+__mmask64 test_kor_mask64(__mmask64 A, __mmask64 B) {
+  // CIR-LABEL: _kor_mask64
+  // CIR: cir.cast bitcast {{.*}} : !u64i -> !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !u64i -> !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.binop(or, {{.*}}, {{.*}}) : !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !cir.vector<64 x !cir.int<u, 1>> -> !u64i
+
+  // LLVM-LABEL: _kor_mask64
+  // LLVM: [[L:%.*]] = bitcast i64 %{{.*}} to <64 x i1>
+  // LLVM: [[R:%.*]] = bitcast i64 %{{.*}} to <64 x i1>
+  // LLVM: or <64 x i1> [[L]], [[R]]
+  // LLVM: bitcast <64 x i1> {{.*}} to i64
+
+  // OGCG-LABEL: _kor_mask64
+  // OGCG: bitcast i64 %{{.*}} to <64 x i1>
+  // OGCG: bitcast i64 %{{.*}} to <64 x i1>
+  // OGCG: or <64 x i1>
+  // OGCG: bitcast <64 x i1> {{.*}} to i64
+  return _kor_mask64(A, B);
+}
+
+__mmask32 test_kxor_mask32(__mmask32 A, __mmask32 B) {
+  // CIR-LABEL: _kxor_mask32
+  // CIR: cir.cast bitcast {{.*}} : !u32i -> !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !u32i -> !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.binop(xor, {{.*}}, {{.*}}) : !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !cir.vector<32 x !cir.int<u, 1>> -> !u32i
+
+  // LLVM-LABEL: _kxor_mask32
+  // LLVM: [[L:%.*]] = bitcast i32 %{{.*}} to <32 x i1>
+  // LLVM: [[R:%.*]] = bitcast i32 %{{.*}} to <32 x i1>
+  // LLVM: xor <32 x i1> [[L]], [[R]]
+  // LLVM: bitcast <32 x i1> {{.*}} to i32
+
+  // OGCG-LABEL: _kxor_mask32
+  // OGCG: bitcast i32 %{{.*}} to <32 x i1>
+  // OGCG: bitcast i32 %{{.*}} to <32 x i1>
+  // OGCG: xor <32 x i1>
+  // OGCG: bitcast <32 x i1> {{.*}} to i32
+  return _kxor_mask32(A, B);
+}
+
+__mmask64 test_kxor_mask64(__mmask64 A, __mmask64 B) {
+  // CIR-LABEL: _kxor_mask64
+  // CIR: cir.cast bitcast {{.*}} : !u64i -> !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !u64i -> !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.binop(xor, {{.*}}, {{.*}}) : !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !cir.vector<64 x !cir.int<u, 1>> -> !u64i
+
+  // LLVM-LABEL: _kxor_mask64
+  // LLVM: [[L:%.*]] = bitcast i64 %{{.*}} to <64 x i1>
+  // LLVM: [[R:%.*]] = bitcast i64 %{{.*}} to <64 x i1>
+  // LLVM: xor <64 x i1> [[L]], [[R]]
+  // LLVM: bitcast <64 x i1> {{.*}} to i64
+
+  // OGCG-LABEL: _kxor_mask64
+  // OGCG: bitcast i64 %{{.*}} to <64 x i1>
+  // OGCG: bitcast i64 %{{.*}} to <64 x i1>
+  // OGCG: xor <64 x i1>
+  // OGCG: bitcast <64 x i1> {{.*}} to i64
+  return _kxor_mask64(A, B);
+}
+
+__mmask32 test_kxnor_mask32(__mmask32 A, __mmask32 B) {
+  // CIR-LABEL: _kxnor_mask32
+  // CIR: cir.cast bitcast {{.*}} : !u32i -> !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !u32i -> !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.unary(not, {{.*}}) : !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.binop(xor, {{.*}}, {{.*}}) : !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !cir.vector<32 x !cir.int<u, 1>> -> !u32i
+
+  // LLVM-LABEL: _kxnor_mask32
+  // LLVM: [[L:%.*]] = bitcast i32 %{{.*}} to <32 x i1>
+  // LLVM: [[R:%.*]] = bitcast i32 %{{.*}} to <32 x i1>
+  // LLVM: [[NOT:%.*]] = xor <32 x i1> [[L]], splat (i1 true)
+  // LLVM: [[RES:%.*]] = xor <32 x i1> [[NOT]], [[R]]
+  // LLVM: bitcast <32 x i1> [[RES]] to i32
+
+  // OGCG-LABEL: _kxnor_mask32
+  // OGCG: bitcast i32 %{{.*}} to <32 x i1>
+  // OGCG: bitcast i32 %{{.*}} to <32 x i1>
+  // OGCG: xor <32 x i1>
+  // OGCG: xor <32 x i1>
+  // OGCG: bitcast <32 x i1> {{.*}} to i32
+
+  return _kxnor_mask32(A, B);
+}
+
+__mmask64 test_kxnor_mask64(__mmask64 A, __mmask64 B) {
+  // CIR-LABEL: _kxnor_mask64
+  // CIR: cir.cast bitcast {{.*}} : !u64i -> !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !u64i -> !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.unary(not, {{.*}}) : !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.binop(xor, {{.*}}, {{.*}}) : !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !cir.vector<64 x !cir.int<u, 1>> -> !u64i
+
+  // LLVM-LABEL: _kxnor_mask64
+  // LLVM: [[L:%.*]] = bitcast i64 %{{.*}} to <64 x i1>
+  // LLVM: [[R:%.*]] = bitcast i64 %{{.*}} to <64 x i1>
+  // LLVM: [[NOT:%.*]] = xor <64 x i1> [[L]], splat (i1 true)
+  // LLVM: [[RES:%.*]] = xor <64 x i1> [[NOT]], [[R]]
+  // LLVM: bitcast <64 x i1> [[RES]] to i64
+
+  // OGCG-LABEL: _kxnor_mask64
+  // OGCG: bitcast i64 %{{.*}} to <64 x i1>
+  // OGCG: bitcast i64 %{{.*}} to <64 x i1>
+  // OGCG: xor <64 x i1>
+  // OGCG: xor <64 x i1>
+  // OGCG: bitcast <64 x i1> {{.*}} to i64
+
+  return _kxnor_mask64(A, B);
+}
+
+
+__mmask32 test_knot_mask32(__mmask32 A) {
+  // CIR-LABEL: _knot_mask32
+  // CIR: cir.cast bitcast {{.*}} : !u32i -> !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.unary(not, {{.*}}) : !cir.vector<32 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !cir.vector<32 x !cir.int<u, 1>> -> !u32i
+
+  // LLVM-LABEL: _knot_mask32
+  // LLVM: bitcast i32 %{{.*}} to <32 x i1>
+  // LLVM: xor <32 x i1>
+  // LLVM: bitcast <32 x i1> {{.*}} to i32
+
+  // OGCG-LABEL: _knot_mask32
+  // OGCG: bitcast i32 %{{.*}} to <32 x i1>
+  // OGCG: xor <32 x i1>
+  // OGCG: bitcast <32 x i1> {{.*}} to i32
+  return _knot_mask32(A);
+}
+
+__mmask64 test_knot_mask64(__mmask64 A) {
+  // CIR-LABEL: _knot_mask64
+  // CIR: cir.cast bitcast {{.*}} : !u64i -> !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.unary(not, {{.*}}) : !cir.vector<64 x !cir.int<u, 1>>
+  // CIR: cir.cast bitcast {{.*}} : !cir.vector<64 x !cir.int<u, 1>> -> !u64i
+
+  // LLVM-LABEL: _knot_mask64
+  // LLVM: bitcast i64 %{{.*}} to <64 x i1>
+  // LLVM: xor <64 x i1>
+  // LLVM: bitcast <64 x i1> {{.*}} to i64
+
+  // OGCG-LABEL: _knot_mask64
+  // OGCG: bitcast i64 %{{.*}} to <64 x i1>
+  // OGCG: xor <64 x i1>
+  // OGCG: bitcast <64 x i1> {{.*}} to i64
+  return _knot_mask64(A);
+}
+
+// Multiple user-level mask helpers inline to this same kmov...
[truncated]

GeneraluseAI · 2025-11-29T10:26:04Z

Because ClangIR currently lacks a zext instruction, the lowering uses bool_to_int combined with an integer cast to reproduce the zero-extension behavior performed in the LLVM implementation. The resulting IR is semantically equivalent.
Please let me know if this approach looks reasonable or if you would prefer a different modeling here. @andykaylor

clang/lib/CIR/CodeGen/CIRGenBuiltinX86.cpp

clang/test/CIR/CodeGenBuiltins/X86/avx512bw-builtins.c

xlauko

Probably for other PR, but can we pull out mlir::Location loc = getLoc(expr->getExprLoc()) in front of the switch and use loc uniformly in all builtins.

GeneraluseAI · 2025-12-02T11:38:33Z

Probably for other PR, but can we pull out mlir::Location loc = getLoc(expr->getExprLoc()) in front of the switch and use loc uniformly in all builtins.

I think it's reasonable, I think we can create a commit to pull out all the "getLoc(expr->getExprLoc())" in existing codes.

github-actions · 2025-12-03T21:16:17Z

🐧 Linux x64 Test Results

112557 tests passed
4091 tests skipped

✅ The build succeeded and all tests passed.

andykaylor

lgtm

…rtestz) This patch adds CIR codegen support for the AVX512 mask test builtins on X86, including kortestc and kortestz across all supported mask widths (qi, hi, si, di). Each builtin is lowered to the expected vector<i1> mask logic and scalar comparison form in CIR, consistent with the semantics of the corresponding LLVM implementations. Because ClangIR does not yet provide a dedicated `zext` operation, the lowering emulates zero-extension by first converting the boolean result through `bool_to_int` and then performing an integer cast to the final result type. This reproduces the `icmp` + `zext` pattern used in LLVM IR and maintains semantic equivalence.

GeneraluseAI · 2025-12-04T12:24:16Z

@andykaylor Just resolved conflict and ready to be merged, thank you!

github-actions · 2025-12-04T18:44:56Z

@GeneraluseAI Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

GeneraluseAI requested review from andykaylor, bcardosolopes, lanza and xlauko as code owners November 29, 2025 10:23

llvmbot added clang Clang issues not falling into any other category ClangIR Anything related to the ClangIR project labels Nov 29, 2025

GeneraluseAI force-pushed the cir_x86_avx512_ktest_builtin_lowering branch from c8ffb85 to 7308ac1 Compare December 1, 2025 06:30

andykaylor reviewed Dec 1, 2025

View reviewed changes

xlauko reviewed Dec 2, 2025

View reviewed changes

GeneraluseAI force-pushed the cir_x86_avx512_ktest_builtin_lowering branch 3 times, most recently from d6e4d7c to 2500d45 Compare December 3, 2025 20:53

andykaylor approved these changes Dec 4, 2025

View reviewed changes

GeneraluseAI force-pushed the cir_x86_avx512_ktest_builtin_lowering branch from 2500d45 to 0cc7a28 Compare December 4, 2025 07:39

GeneraluseAI force-pushed the cir_x86_avx512_ktest_builtin_lowering branch from 0cc7a28 to ee4624c Compare December 4, 2025 09:37

GeneraluseAI requested a review from andykaylor December 4, 2025 12:23

andykaylor merged commit 08c2a86 into llvm:main Dec 4, 2025
10 checks passed

[CIR][X86] Implement lowering for AVX512 ktest builtins #169985

[CIR][X86] Implement lowering for AVX512 ktest builtins #169985

Conversation

GeneraluseAI commented Nov 29, 2025

Uh oh!

github-actions bot commented Nov 29, 2025

Uh oh!

llvmbot commented Nov 29, 2025

Uh oh!

llvmbot commented Nov 29, 2025

Uh oh!

GeneraluseAI commented Nov 29, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xlauko left a comment

Choose a reason for hiding this comment

Uh oh!

GeneraluseAI commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐧 Linux x64 Test Results

Uh oh!

andykaylor left a comment

Choose a reason for hiding this comment

Uh oh!

GeneraluseAI commented Dec 4, 2025

Uh oh!

Uh oh!

github-actions bot commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

GeneraluseAI commented Dec 2, 2025 •

edited

Loading

github-actions bot commented Dec 3, 2025 •

edited

Loading