Skip to content

Add pattern matching for SVE intrinsics that operate on mask operands #114438

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
May 20, 2025

Conversation

snickolls-arm
Copy link
Contributor

Introduces fgMorphTryUseAllMaskVariant for ARM64 that looks for various named intrinsics that have operands that look 'mask-like'. E.g. source operands originating from Sve.CreateTrueMask* may be recognized as masks, causing the JIT to prefer to use the predicated version of the instruction as codegen for the intrinsic. It will also inspect ConditionalSelect intrinsic nodes to match instructions with governing predicates. The transform runs during morph.

It's possible to emit the following instructions after this patch:

* ZIP{1,2} <Pd>.<T>, <Pn>.<T>, <Pm>.<T> (Sve.ZipLow, Sve.ZipHigh)
* UZP{1,2} <Pd>.<T>, <Pn>.<T>, <Pm>.<T> (Sve.UnzipEven, Sve.UnzipOdd)
* TRN{1,2} <Pd>.<T>, <Pn>.<T>, <Pm>.<T> (Sve.TransposeEven, Sve.TransposeOdd)
* REV <Pd>.<T>, <Pn>.<T>                (Sve.ReverseElement)
* AND <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B    (Sve.And)
* BIC <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B    (Sve.BitwiseClear)
* EOR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B    (Sve.Xor)
* ORR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B    (Sve.Or)
* SEL <Pd>.B, <Pg>, <Pn>.B, <Pm>.B      (Sve.ConditionalSelect)

Contributes towards #101970

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 9, 2025
@snickolls-arm
Copy link
Contributor Author

@a74nh @kunalspathak

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Apr 9, 2025
@kunalspathak kunalspathak added the arm-sve Work related to arm64 SVE/SVE2 support label Apr 9, 2025
@a74nh
Copy link
Contributor

a74nh commented Apr 9, 2025

It'd be nice to add some disasm tests here. But I don't think we currently can for SVE (we couldn't back in #109286 from what I remember).

AIUI, the problem is the ARM64-FULL-LINE command has to be valid where ever it's run, and we can't just work around it by putting an if(SVE) check around it.

Has that issue gone away now we have cobalt in the CI?

Alternatively, could we add ARM64-SVE-FULL-LINE to the disasmcheck infrastructure?

//
GenTree* Compiler::gtNewSimdAllFalseMaskNode(unsigned simdSize)
{
return gtNewSimdHWIntrinsicNode(TYP_MASK, NI_Sve_CreateFalseMaskByte, CORINFO_TYPE_BYTE, simdSize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure on this line.

It should be switch(type) case byte: NI_Sve_CreateFalseMaskByte; case Int32: NI_Sve_CreateFalseMaskInt32 etc etc to keep to the hwintrinsiclistarm64sve.h interface.

However, regardless of which is used, it'll still produce the same pfalse instruction.

Alternatively, add a NI_Sve_CreateFalseMaskAll similar to NI_Sve_CreateTrueMaskAll which can take any type. But that require support adding to a few additional files.

@kunalspathak
Copy link
Member

Seems some test failure

Beginning scenario: ConditionalSelect_FalseOp_all - operation in FalseValue

Assert failure(PID 9260 [0x0000242c], Thread: 8020 [0x1f54]): Assertion failed 'ins != INS_invalid' in 'JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_AbsoluteCompareGreaterThan_float:ConditionalSelect_ZeroOp():this' during 'Generate code' (IL size 231; hash 0xca76bdfb; FullOpts)

    File: D:\a\_work\1\s\src\coreclr\jit\hwintrinsiccodegenarm64.cpp:363
    Image: C:\h\w\A54C0924\p\corerun.exe

Introduces `fgMorphTryUseAllMaskVariant` for ARM64 that looks for various
named intrinsics that have operands that look 'mask-like'. E.g. source
operands originating from Sve.CreateTrueMask* may be recognized as
masks, causing the JIT to prefer to use the predicated version of the
instruction as codegen for the intrinsic. It will also inspect
ConditionalSelect intrinsic nodes to match instructions with governing
predicates. The transform runs during morph.

It's possible to emit the following instructions after this patch:
* ZIP{1,2} <Pd>.<T>, <Pn>.<T>, <Pm>.<T> (Sve.ZipLow, Sve.ZipHigh)
* UZP{1,2} <Pd>.<T>, <Pn>.<T>, <Pm>.<T> (Sve.UnzipEven, Sve.UnzipOdd)
* TRN{1,2} <Pd>.<T>, <Pn>.<T>, <Pm>.<T> (Sve.TransposeEven, Sve.TransposeOdd)
* REV <Pd>.<T>, <Pn>.<T>                (Sve.ReverseElement)
* AND <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B    (Sve.And)
* BIC <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B    (Sve.BitwiseClear)
* EOR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B    (Sve.Xor)
* ORR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B    (Sve.Or)
* SEL <Pd>.B, <Pg>, <Pn>.B, <Pm>.B      (Sve.ConditionalSelect)

Contributes towards dotnet#101970
@snickolls-arm
Copy link
Contributor Author

@kunalspathak I have fixed the test and some other build issues, this should be ready for review now.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some questions/suggestion

{
switch (GetHWIntrinsicId())
{
// ZIP1 <Pd>.<T>, <Pn>.<T>, <Pm>.<T>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if we should add a HW_Flag_AllMaskVariant for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to use flag space as this list is unlikely to grow, these were the only instructions I could find that follow this pattern across all versions of SVE. But it might make it easier to apply this transform to other intrinsics in future if we find other patterns work too.

//
GenTree* Compiler::fgMorphTryUseAllMaskVariant(GenTreeHWIntrinsic* node)
{
if (node->HasAllMaskVariant() && canMorphAllVectorOperandsToMasks(node))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to have node->HasAllMaskVariant() inside canMorphAllVectorOperandsToMasks() itself. That way for conditional select's left operand too, you can (and should) exercise it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this change if all of these intrinsics have HasAllMaskVariant() == true, but I don't think this works, see my comment below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If #114438 (comment) works, then consider tagging them with HW_Flag_AllMaskVariant, move the HasAllMaskVariant() inside canMorphAllVectorOperandsToMasks. Having HW_Flag_AllMaskVariant in table helps in easy discoverability of various flags in one place.

// BIC <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B
// EOR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B
// ORR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B
case NI_Sve_And:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think these too should be marked as HW_Flag_AllMaskVariant and looked for in HasAllMaskVariant() itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried grouping these intrinsics with the others initially but it doesn't work because these ones should only be considered in relation to a ConditionalSelect. Grouping them with the others causes a transformation to run when a ConditionalSelect is not present, which wouldn't be correct for these instructions because they require the mask parameter for the governing predicate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We wrap the IR nodes that has embedded mask semantics like And inside a ConditionalSelect during lowering, which runs way after the morphing phase where you are doing this optimization. See if (HWIntrinsicInfo::IsEmbeddedMaskedOperation(intrinsicId)) in LowerHWIntrinsic(). Until then, they continue to hold Vector operands. If you do the transformation here for IR nodes that has And(mask, mask), it shouldn't prohibit us from wrapping it in ConditionalSelect in lowering.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it's marked HW_Flag_OptionalEmbeddedMaskedOperation, so I think this wrapping isn't occurring for this intrinsic? When I try to implement it like this, it changes all the operands to masks and then tries to emit AND <Zd>.D, <Zn>.D, <Zm>.D and runs into this assert because the register types are wrong:

assert(isVectorRegister(reg3)); // ddddd

The mask variant of this intrinsic has an embedded mask, but it's required for this instruction instead of optional, so there would also need to be some handling of this edge case in codegen to make sure it definitely wraps the mask variant in ConditionalSelect. It feels like there should be a separate set of flags for when the intrinsic is TYP_MASK or TYP_SIMD. E.g. HW_Flag_MaskVariant(Optional)EmbeddedMaskOperation, etc.

Copy link
Member

@kunalspathak kunalspathak Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a separate intrinsics And_Predicates (and likewise for other APIs that have predicates variant). They are added in the section "Special intrinsics that are generated during importing or lowering". And_Predicates should have HW_Flag_EmbeddedMaskedOperation. We can have flag HW_Flag_AllMaskVariant on SVE_And intrinsics, to detect it in morph if this can be transformed into And_Predicates variant.

We come here in the morph and see And(Vector, Vector). If operands are mask, we can transform the node into And_Predicates(Mask, Mask). During lowering, we can then transform it into CndSel(AllTrue, And_Predicates(Mask, Mask), Zero) and codegen will handle generating the predicated version of And (predicates).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds much better than what I was thinking, I'll try and implement this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed this implementation, it seems simpler. N.B. we've run out of flag space now as I've taken the last bit for HW_Flag_HasMaskVariant. We might need to think about expanding the flag space for SVE2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed this implementation, it seems simpler. N.B. we've run out of flag space now as I've taken the last bit for HW_Flag_HasMaskVariant. We might need to think about expanding the flag space for SVE2.

Added #115474 to track this

// Return Value:
// The mask
//
GenTree* Compiler::gtNewSimdAllFalseMaskNode(unsigned simdSize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
GenTree* Compiler::gtNewSimdAllFalseMaskNode(unsigned simdSize)
GenTree* Compiler::gtNewSimdFalseMaskByteNode(unsigned simdSize)

@@ -9218,6 +9218,15 @@ GenTree* Compiler::fgOptimizeHWIntrinsic(GenTreeHWIntrinsic* node)
}
}

#ifdef TARGET_ARM64
optimizedTree = fgMorphTryUseAllMaskVariant(node);
if (optimizedTree != nullptr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having it here might be preventing the node from getting further transformations/optimizations. Should this be done towards the end of this method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine, I've moved it later in the method.

@snickolls-arm
Copy link
Contributor Author

These latest failures aren't related to this patch if I'm not mistaken. Some build and test failures on other platforms.

@kunalspathak
Copy link
Member

/azp run runtime-coreclr superpmi-diffs

Copy link

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

return (flags & HW_Flag_HasAllMaskVariant) != 0;
}

static NamedIntrinsic GetMaskVariant(NamedIntrinsic id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know lot of methods in this file don't have method summary, but we try to add them for newly added methods. So please add a line or 2 for them.

@kunalspathak
Copy link
Member

can you share the code generated for some of the test cases you have added?

@kunalspathak kunalspathak added the needs-author-action An issue or pull request that requires more info or actions from the author. label May 14, 2025
@snickolls-arm
Copy link
Contributor Author

can you share the code generated for some of the test cases you have added?

For example:

        public static unsafe void Main()
        {
            System.Console.WriteLine(
                Sve.ZipHigh(Sve.CreateTrueMaskInt16(), Sve.CreateFalseMaskInt16())
            );
        }

Previously generated:

...
IN0005: 000018      ptrue   p0.h
IN0006: 00001C      mov     z16.h, p0/z, #1
IN0007: 000020      pfalse  p0.b
IN0008: 000024      mov     z17.h, p0/z, #1
IN0009: 000028      zip2    z16.h, z16.h, z17.h
...

and now generates:

...
IN0005: 000018      ptrue   p0.h
IN0006: 00001C      pfalse  p1.b
IN0007: 000020      zip2    p0.h, p0.h, p1.h
IN0008: 000024      mov     z16.h, p0/z, #1
...

So we avoid the unnecessary conversions on the operands (the conversion of the result is still required as the vector is printed afterwards).

@dotnet-policy-service dotnet-policy-service bot removed the needs-author-action An issue or pull request that requires more info or actions from the author. label May 14, 2025
static bool HasAllMaskVariant(NamedIntrinsic id)
{
const HWIntrinsicFlag flags = lookupFlags(id);
return (flags & HW_Flag_HasAllMaskVariant) != 0;
}

// GetMaskVariant: Given an intrinsic that has a variant that operates on mask types, return the ID of
// this variant intrinsic. Call HasAllMaskVariant before using this function, as it will
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be in a follow-up PR, add assert(HasAllMaskVariant(id));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, there is a build failure, so can you fix this as well along with fixing the build error?

/__w/1/s/src/coreclr/jit/morpharm64.cpp:99:37: error: comparison of integer expressions of different signedness: ‘size_t’ {aka ‘long unsigned int’} and ‘int’ [-Werror=sign-compare]
     99 |         if (node->GetOperandCount() == numArgs)
        |             ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~

@@ -9679,6 +9679,15 @@ GenTree* Compiler::fgOptimizeHWIntrinsic(GenTreeHWIntrinsic* node)
}
}

#ifdef TARGET_ARM64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting on line 9364 is there is some code for doing something very similar to this PR.

It's all wrapped in

if (GenTreeHWIntrinsic::OperIsBitwiseHWIntrinsic(oper))

And then does nothing on Arm64 because maskIntrinsicId is only set on TARGET_XARCH.

This PR's fgMorphTryUseAllMaskVariant() is effectively doing the Arm64 version. It doesn't fit into the existing code because Arm64 has more than just bitwise operations that have all mask variants (eg Zip).

Curiously, OperIsBitwiseHWIntrinsic() is only ever used by XARCH.


At a minimum, all the existing code should be put into an X64 version of fgMorphTryUseAllMaskVariant() and then call that at line 9364 for Arm64 and X64 instead of having it at the end of the function.

Better I think would be to just fill in the #elif defined(TARGET_ARM64) part and ensure the rest of that section works for Arm64. I'm not sure if it would or if it's incompatible.

Even better would be to also replace OperIsBitwiseHWIntrinsic() with:

    static bool HasAllMaskVariant(NamedIntrinsic id)
    {
#if defined(TARGET_XARCH)
    return (oper == GT_AND) || (oper == GT_AND_NOT) || (oper == GT_NOT) || (oper == GT_OR) || (oper == GT_OR_NOT) ||
           (oper == GT_XOR) || (oper == GT_XOR_NOT);
#else defined(TARGET_ARM64)
        const HWIntrinsicFlag flags = lookupFlags(id);
        return (flags & HW_Flag_HasAllMaskVariant) != 0;
#endif
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented this by moving both XARCH and ARM64 into another function that has two conditionally compiled bodies. It should be clearer now which part of the code is trying to optimize for masked intrinsics, for both architectures.

// canMorphVectorOperandToMask: Can this vector operand be converted to a
// node with type TYP_MASK easily?
//
bool Compiler::canMorphVectorOperandToMask(GenTree* node)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just one suggestion would be to have it in morph.cpp itself instead of creating new file. We have lot of xarch and arm64 code scattered in morph.cpp, and if we truly want morpharm64.cpp, we should move that too (someday)

@kunalspathak
Copy link
Member

also, looks like there was some bug introduced in recent refactoring because of which the tests are failing. PTAL

@kunalspathak kunalspathak added the needs-author-action An issue or pull request that requires more info or actions from the author. label May 16, 2025
@snickolls-arm
Copy link
Contributor Author

also, looks like there was some bug introduced in recent refactoring because of which the tests are failing. PTAL

I haven't been able to reproduce this test failure in my x64 environment, is there a way I can see the exact environment of this run? I'm probably missing some stress configuration

@dotnet-policy-service dotnet-policy-service bot removed the needs-author-action An issue or pull request that requires more info or actions from the author. label May 19, 2025
@kunalspathak
Copy link
Member

also, looks like there was some bug introduced in recent refactoring because of which the tests are failing. PTAL

I haven't been able to reproduce this test failure in my x64 environment, is there a way I can see the exact environment of this run? I'm probably missing some stress configuration

yeah, that could be because it needs AVX-512 support. I have fixed the problem, it was minor refactoring related.

@kunalspathak
Copy link
Member

Build failure is from #115767.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kunalspathak
Copy link
Member

/ba-g known build failure

@kunalspathak kunalspathak merged commit 6d3b842 into dotnet:main May 20, 2025
108 of 114 checks passed
SimaTian pushed a commit that referenced this pull request May 27, 2025
…#114438)

* Add pattern matching for SVE intrinsics that operate on mask operands

Introduces `fgMorphTryUseAllMaskVariant` for ARM64 that looks for various
named intrinsics that have operands that look 'mask-like'. E.g. source
operands originating from Sve.CreateTrueMask* may be recognized as
masks, causing the JIT to prefer to use the predicated version of the
instruction as codegen for the intrinsic. It will also inspect
ConditionalSelect intrinsic nodes to match instructions with governing
predicates. The transform runs during morph.

It's possible to emit the following instructions after this patch:
* ZIP{1,2} <Pd>.<T>, <Pn>.<T>, <Pm>.<T> (Sve.ZipLow, Sve.ZipHigh)
* UZP{1,2} <Pd>.<T>, <Pn>.<T>, <Pm>.<T> (Sve.UnzipEven, Sve.UnzipOdd)
* TRN{1,2} <Pd>.<T>, <Pn>.<T>, <Pm>.<T> (Sve.TransposeEven, Sve.TransposeOdd)
* REV <Pd>.<T>, <Pn>.<T>                (Sve.ReverseElement)
* AND <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B    (Sve.And)
* BIC <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B    (Sve.BitwiseClear)
* EOR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B    (Sve.Xor)
* ORR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B    (Sve.Or)
* SEL <Pd>.B, <Pg>, <Pn>.B, <Pm>.B      (Sve.ConditionalSelect)

Contributes towards #101970

* Fix test failure and add FileCheck tests

* Don't run tests on OSX

* Don't run tests for Mono

* Move the transform later in fgOptimizeHWIntrinsic

* Rename gtNewSimdAllFalseMaskNode

* Re-design using HW_Flag_AllMaskVariant

* Add missing function documentation in hwintrinsic.h

* Fix integer comparison and add assertion

* Refactor to follow similar path to XARCH

* fix the refactoring

* jit formatting

* Move code into morph.cpp

---------

Co-authored-by: Kunal Pathak <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI arm-sve Work related to arm64 SVE/SVE2 support community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants