-
Notifications
You must be signed in to change notification settings - Fork 5k
Add pattern matching for SVE intrinsics that operate on mask operands #114438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pattern matching for SVE intrinsics that operate on mask operands #114438
Conversation
It'd be nice to add some disasm tests here. But I don't think we currently can for SVE (we couldn't back in #109286 from what I remember). AIUI, the problem is the Has that issue gone away now we have cobalt in the CI? Alternatively, could we add |
src/coreclr/jit/hwintrinsicarm64.cpp
Outdated
// | ||
GenTree* Compiler::gtNewSimdAllFalseMaskNode(unsigned simdSize) | ||
{ | ||
return gtNewSimdHWIntrinsicNode(TYP_MASK, NI_Sve_CreateFalseMaskByte, CORINFO_TYPE_BYTE, simdSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure on this line.
It should be switch(type) case byte: NI_Sve_CreateFalseMaskByte; case Int32: NI_Sve_CreateFalseMaskInt32
etc etc to keep to the hwintrinsiclistarm64sve.h interface.
However, regardless of which is used, it'll still produce the same pfalse
instruction.
Alternatively, add a NI_Sve_CreateFalseMaskAll
similar to NI_Sve_CreateTrueMaskAll
which can take any type. But that require support adding to a few additional files.
Seems some test failure
|
Introduces `fgMorphTryUseAllMaskVariant` for ARM64 that looks for various named intrinsics that have operands that look 'mask-like'. E.g. source operands originating from Sve.CreateTrueMask* may be recognized as masks, causing the JIT to prefer to use the predicated version of the instruction as codegen for the intrinsic. It will also inspect ConditionalSelect intrinsic nodes to match instructions with governing predicates. The transform runs during morph. It's possible to emit the following instructions after this patch: * ZIP{1,2} <Pd>.<T>, <Pn>.<T>, <Pm>.<T> (Sve.ZipLow, Sve.ZipHigh) * UZP{1,2} <Pd>.<T>, <Pn>.<T>, <Pm>.<T> (Sve.UnzipEven, Sve.UnzipOdd) * TRN{1,2} <Pd>.<T>, <Pn>.<T>, <Pm>.<T> (Sve.TransposeEven, Sve.TransposeOdd) * REV <Pd>.<T>, <Pn>.<T> (Sve.ReverseElement) * AND <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B (Sve.And) * BIC <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B (Sve.BitwiseClear) * EOR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B (Sve.Xor) * ORR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B (Sve.Or) * SEL <Pd>.B, <Pg>, <Pn>.B, <Pm>.B (Sve.ConditionalSelect) Contributes towards dotnet#101970
ddb2472
to
c5922a1
Compare
@kunalspathak I have fixed the test and some other build issues, this should be ready for review now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some questions/suggestion
src/coreclr/jit/morpharm64.cpp
Outdated
{ | ||
switch (GetHWIntrinsicId()) | ||
{ | ||
// ZIP1 <Pd>.<T>, <Pn>.<T>, <Pm>.<T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering if we should add a HW_Flag_AllMaskVariant
for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't want to use flag space as this list is unlikely to grow, these were the only instructions I could find that follow this pattern across all versions of SVE. But it might make it easier to apply this transform to other intrinsics in future if we find other patterns work too.
src/coreclr/jit/morpharm64.cpp
Outdated
// | ||
GenTree* Compiler::fgMorphTryUseAllMaskVariant(GenTreeHWIntrinsic* node) | ||
{ | ||
if (node->HasAllMaskVariant() && canMorphAllVectorOperandsToMasks(node)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to have node->HasAllMaskVariant()
inside canMorphAllVectorOperandsToMasks()
itself. That way for conditional select's left operand too, you can (and should) exercise it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this change if all of these intrinsics have HasAllMaskVariant() == true
, but I don't think this works, see my comment below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If #114438 (comment) works, then consider tagging them with HW_Flag_AllMaskVariant
, move the HasAllMaskVariant()
inside canMorphAllVectorOperandsToMasks
. Having HW_Flag_AllMaskVariant
in table helps in easy discoverability of various flags in one place.
src/coreclr/jit/morpharm64.cpp
Outdated
// BIC <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B | ||
// EOR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B | ||
// ORR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B | ||
case NI_Sve_And: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think these too should be marked as HW_Flag_AllMaskVariant
and looked for in HasAllMaskVariant()
itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried grouping these intrinsics with the others initially but it doesn't work because these ones should only be considered in relation to a ConditionalSelect
. Grouping them with the others causes a transformation to run when a ConditionalSelect
is not present, which wouldn't be correct for these instructions because they require the mask
parameter for the governing predicate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We wrap the IR nodes that has embedded mask semantics like And
inside a ConditionalSelect
during lowering, which runs way after the morphing phase where you are doing this optimization. See if (HWIntrinsicInfo::IsEmbeddedMaskedOperation(intrinsicId))
in LowerHWIntrinsic()
. Until then, they continue to hold Vector
operands. If you do the transformation here for IR nodes that has And(mask, mask)
, it shouldn't prohibit us from wrapping it in ConditionalSelect
in lowering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently it's marked HW_Flag_OptionalEmbeddedMaskedOperation
, so I think this wrapping isn't occurring for this intrinsic? When I try to implement it like this, it changes all the operands to masks and then tries to emit AND <Zd>.D, <Zn>.D, <Zm>.D
and runs into this assert because the register types are wrong:
runtime/src/coreclr/jit/emitarm64sve.cpp
Line 2912 in c5922a1
assert(isVectorRegister(reg3)); // ddddd |
The mask variant of this intrinsic has an embedded mask, but it's required for this instruction instead of optional, so there would also need to be some handling of this edge case in codegen to make sure it definitely wraps the mask variant in ConditionalSelect
. It feels like there should be a separate set of flags for when the intrinsic is TYP_MASK
or TYP_SIMD
. E.g. HW_Flag_MaskVariant(Optional)EmbeddedMaskOperation
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have a separate intrinsics And_Predicates
(and likewise for other APIs that have predicates variant). They are added in the section "Special intrinsics that are generated during importing or lowering". And_Predicates
should have HW_Flag_EmbeddedMaskedOperation
. We can have flag HW_Flag_AllMaskVariant
on SVE_And
intrinsics, to detect it in morph if this can be transformed into And_Predicates
variant.
We come here in the morph and see And(Vector, Vector)
. If operands are mask, we can transform the node into And_Predicates(Mask, Mask)
. During lowering, we can then transform it into CndSel(AllTrue, And_Predicates(Mask, Mask), Zero)
and codegen will handle generating the predicated version of And (predicates)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds much better than what I was thinking, I'll try and implement this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed this implementation, it seems simpler. N.B. we've run out of flag space now as I've taken the last bit for HW_Flag_HasMaskVariant
. We might need to think about expanding the flag space for SVE2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed this implementation, it seems simpler. N.B. we've run out of flag space now as I've taken the last bit for
HW_Flag_HasMaskVariant
. We might need to think about expanding the flag space for SVE2.
Added #115474 to track this
src/coreclr/jit/hwintrinsicarm64.cpp
Outdated
// Return Value: | ||
// The mask | ||
// | ||
GenTree* Compiler::gtNewSimdAllFalseMaskNode(unsigned simdSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GenTree* Compiler::gtNewSimdAllFalseMaskNode(unsigned simdSize) | |
GenTree* Compiler::gtNewSimdFalseMaskByteNode(unsigned simdSize) |
src/coreclr/jit/morph.cpp
Outdated
@@ -9218,6 +9218,15 @@ GenTree* Compiler::fgOptimizeHWIntrinsic(GenTreeHWIntrinsic* node) | |||
} | |||
} | |||
|
|||
#ifdef TARGET_ARM64 | |||
optimizedTree = fgMorphTryUseAllMaskVariant(node); | |||
if (optimizedTree != nullptr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having it here might be preventing the node from getting further transformations/optimizations. Should this be done towards the end of this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fine, I've moved it later in the method.
These latest failures aren't related to this patch if I'm not mistaken. Some build and test failures on other platforms. |
/azp run runtime-coreclr superpmi-diffs |
Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command. |
return (flags & HW_Flag_HasAllMaskVariant) != 0; | ||
} | ||
|
||
static NamedIntrinsic GetMaskVariant(NamedIntrinsic id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i know lot of methods in this file don't have method summary, but we try to add them for newly added methods. So please add a line or 2 for them.
can you share the code generated for some of the test cases you have added? |
For example:
Previously generated:
and now generates:
So we avoid the unnecessary conversions on the operands (the conversion of the result is still required as the vector is printed afterwards). |
static bool HasAllMaskVariant(NamedIntrinsic id) | ||
{ | ||
const HWIntrinsicFlag flags = lookupFlags(id); | ||
return (flags & HW_Flag_HasAllMaskVariant) != 0; | ||
} | ||
|
||
// GetMaskVariant: Given an intrinsic that has a variant that operates on mask types, return the ID of | ||
// this variant intrinsic. Call HasAllMaskVariant before using this function, as it will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be in a follow-up PR, add assert(HasAllMaskVariant(id));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, there is a build failure, so can you fix this as well along with fixing the build error?
/__w/1/s/src/coreclr/jit/morpharm64.cpp:99:37: error: comparison of integer expressions of different signedness: ‘size_t’ {aka ‘long unsigned int’} and ‘int’ [-Werror=sign-compare]
99 | if (node->GetOperandCount() == numArgs)
| ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~
src/coreclr/jit/morph.cpp
Outdated
@@ -9679,6 +9679,15 @@ GenTree* Compiler::fgOptimizeHWIntrinsic(GenTreeHWIntrinsic* node) | |||
} | |||
} | |||
|
|||
#ifdef TARGET_ARM64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Starting on line 9364 is there is some code for doing something very similar to this PR.
It's all wrapped in
if (GenTreeHWIntrinsic::OperIsBitwiseHWIntrinsic(oper))
And then does nothing on Arm64 because maskIntrinsicId
is only set on TARGET_XARCH
.
This PR's fgMorphTryUseAllMaskVariant()
is effectively doing the Arm64 version. It doesn't fit into the existing code because Arm64 has more than just bitwise operations that have all mask variants (eg Zip).
Curiously, OperIsBitwiseHWIntrinsic()
is only ever used by XARCH.
At a minimum, all the existing code should be put into an X64 version of fgMorphTryUseAllMaskVariant()
and then call that at line 9364 for Arm64 and X64 instead of having it at the end of the function.
Better I think would be to just fill in the #elif defined(TARGET_ARM64)
part and ensure the rest of that section works for Arm64. I'm not sure if it would or if it's incompatible.
Even better would be to also replace OperIsBitwiseHWIntrinsic()
with:
static bool HasAllMaskVariant(NamedIntrinsic id)
{
#if defined(TARGET_XARCH)
return (oper == GT_AND) || (oper == GT_AND_NOT) || (oper == GT_NOT) || (oper == GT_OR) || (oper == GT_OR_NOT) ||
(oper == GT_XOR) || (oper == GT_XOR_NOT);
#else defined(TARGET_ARM64)
const HWIntrinsicFlag flags = lookupFlags(id);
return (flags & HW_Flag_HasAllMaskVariant) != 0;
#endif
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've implemented this by moving both XARCH and ARM64 into another function that has two conditionally compiled bodies. It should be clearer now which part of the code is trying to optimize for masked intrinsics, for both architectures.
src/coreclr/jit/morpharm64.cpp
Outdated
// canMorphVectorOperandToMask: Can this vector operand be converted to a | ||
// node with type TYP_MASK easily? | ||
// | ||
bool Compiler::canMorphVectorOperandToMask(GenTree* node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just one suggestion would be to have it in morph.cpp
itself instead of creating new file. We have lot of xarch and arm64 code scattered in morph.cpp, and if we truly want morpharm64.cpp
, we should move that too (someday)
also, looks like there was some bug introduced in recent refactoring because of which the tests are failing. PTAL |
I haven't been able to reproduce this test failure in my x64 environment, is there a way I can see the exact environment of this run? I'm probably missing some stress configuration |
yeah, that could be because it needs AVX-512 support. I have fixed the problem, it was minor refactoring related. |
Build failure is from #115767. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/ba-g known build failure |
…#114438) * Add pattern matching for SVE intrinsics that operate on mask operands Introduces `fgMorphTryUseAllMaskVariant` for ARM64 that looks for various named intrinsics that have operands that look 'mask-like'. E.g. source operands originating from Sve.CreateTrueMask* may be recognized as masks, causing the JIT to prefer to use the predicated version of the instruction as codegen for the intrinsic. It will also inspect ConditionalSelect intrinsic nodes to match instructions with governing predicates. The transform runs during morph. It's possible to emit the following instructions after this patch: * ZIP{1,2} <Pd>.<T>, <Pn>.<T>, <Pm>.<T> (Sve.ZipLow, Sve.ZipHigh) * UZP{1,2} <Pd>.<T>, <Pn>.<T>, <Pm>.<T> (Sve.UnzipEven, Sve.UnzipOdd) * TRN{1,2} <Pd>.<T>, <Pn>.<T>, <Pm>.<T> (Sve.TransposeEven, Sve.TransposeOdd) * REV <Pd>.<T>, <Pn>.<T> (Sve.ReverseElement) * AND <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B (Sve.And) * BIC <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B (Sve.BitwiseClear) * EOR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B (Sve.Xor) * ORR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B (Sve.Or) * SEL <Pd>.B, <Pg>, <Pn>.B, <Pm>.B (Sve.ConditionalSelect) Contributes towards #101970 * Fix test failure and add FileCheck tests * Don't run tests on OSX * Don't run tests for Mono * Move the transform later in fgOptimizeHWIntrinsic * Rename gtNewSimdAllFalseMaskNode * Re-design using HW_Flag_AllMaskVariant * Add missing function documentation in hwintrinsic.h * Fix integer comparison and add assertion * Refactor to follow similar path to XARCH * fix the refactoring * jit formatting * Move code into morph.cpp --------- Co-authored-by: Kunal Pathak <[email protected]>
Introduces
fgMorphTryUseAllMaskVariant
for ARM64 that looks for various named intrinsics that have operands that look 'mask-like'. E.g. source operands originating fromSve.CreateTrueMask*
may be recognized as masks, causing the JIT to prefer to use the predicated version of the instruction as codegen for the intrinsic. It will also inspectConditionalSelect
intrinsic nodes to match instructions with governing predicates. The transform runs during morph.It's possible to emit the following instructions after this patch:
Contributes towards #101970