Skip to content

Implement SVE2 ShiftLeftAndInsert #115776

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 21, 2025

Conversation

snickolls-arm
Copy link
Contributor

@snickolls-arm snickolls-arm commented May 20, 2025

@a74nh @kunalspathak

Contributes to #115479

@@ -552,6 +552,11 @@ void HWIntrinsicInfo::lookupImmBounds(
immUpperBound = 7;
break;

case NI_Sve2_ShiftLeftAndInsert:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you shouldn't need this because that should be already calculated as part of line 411 above under HW_Category_ShiftLeftByImmediate

@kunalspathak
Copy link
Member

can you share sample disassembly for 3 scenarios?

  • When shift is constant and within bounds
  • When shift is constant and out of bounds
  • When shift is variable

@kunalspathak kunalspathak added needs-author-action An issue or pull request that requires more info or actions from the author. arm-sve Work related to arm64 SVE/SVE2 support labels May 20, 2025
@snickolls-arm
Copy link
Contributor Author

can you share sample disassembly for 3 scenarios?

  • When shift is constant and within bounds
  • When shift is constant and out of bounds
  • When shift is variable

1 generates the following instruction:

// Console.WriteLine(Sve2.ShiftLeftAndInsert(u, v, 21));
...
IN0009: 00002C      sli     z16.s, z8.s, #21
...

2 and 3 both generate a runtime exception (3 specifically when that variable value is out of range):

// Console.WriteLine(Sve2.ShiftLeftAndInsert(u, v, 35));

System.TypeInitializationException: The type initializer for 'JIT.HardwareIntrinsics.Arm._Sve2.Program' threw an exception.
 ---> System.ArgumentOutOfRangeException: Specified argument was out of the range of valid values.
   at System.Runtime.Intrinsics.Arm.Sve2.ShiftLeftAndInsert(Vector`1 left, Vector`1 right, Byte shift)

I can't see the exact code being generated for this, because the intrinsic isn't being generated inline anymore. The function contains a call to the intrinsic which I'm guessing contains the jump to throw.

3 generates this instruction when the variable is in range:

//  byte a = 10;
//  Console.WriteLine(Sve2.ShiftLeftAndInsert(u, v, a));
...
IN0012: 000050      sli     z16.s, z8.s, #10
...

The rest of the C# for reference (I dumped these by adding them to the test suite):

        public static void Dump_Test_1()
        {
            var u = new Vector<int>();
            var v = new Vector<int>();
            Console.WriteLine(Sve2.ShiftLeftAndInsert(u, v, 21));
        }

        public static void Dump_Test_2()
        {
            var u = new Vector<int>();
            var v = new Vector<int>();
            Console.WriteLine(Sve2.ShiftLeftAndInsert(u, v, 35));
        }

        public static void Dump_Test_3()
        {
            var u = new Vector<int>();
            var v = new Vector<int>();
            var rand = new Random();
            byte a = (byte)rand.Next();
            Console.WriteLine(Sve2.ShiftLeftAndInsert(u, v, a));
        }

@dotnet-policy-service dotnet-policy-service bot removed the needs-author-action An issue or pull request that requires more info or actions from the author. label May 21, 2025
@kunalspathak
Copy link
Member

can you try this?

       [MethodImpl(MethodImplOptions.NoInlining)]
        public static Vector<int> Dump_Test_2()
        {
             return Sve2.ShiftLeftAndInsert(u, v, 35);
        }

       [MethodImpl(MethodImplOptions.NoInlining)]
        public static Vector<int> Dump_Test_3(int a)
        {
             return Sve2.ShiftLeftAndInsert(u, v, a);
        }

@snickolls-arm
Copy link
Contributor Author

can you try this?

       [MethodImpl(MethodImplOptions.NoInlining)]
        public static Vector<int> Dump_Test_2()
        {
             return Sve2.ShiftLeftAndInsert(u, v, 35);
        }

       [MethodImpl(MethodImplOptions.NoInlining)]
        public static Vector<int> Dump_Test_3(int a)
        {
             return Sve2.ShiftLeftAndInsert(u, v, a);
        }

The top one is a function that just branches straight to CORINFO_HELP_THROW_ARGUMENTOUTOFRANGEEXCEPTION.

The second one has this:

; Total bytes of code 64, prolog size 8, PerfScore 12.00, instruction count 16, allocated bytes for cod
e 64 (MethodHash=131eba93) for method JIT.HardwareIntrinsics.Arm.Program:Dump_Test_3(ubyte):System.Nume
rics.Vector`1[int] (FullOpts)
; ============================================================

*************** After end code gen, before unwindEmit()
G_M17772_IG01:        ; func=00, offs=0x000000, size=0x0008, bbWeight=1, PerfScore 1.50, gcrefRegs=0000
 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG

IN000d: 000000      stp     fp, lr, [sp, #-0x10]!
IN000e: 000004      mov     fp, sp

G_M17772_IG02:        ; offs=0x000008, size=0x0028, bbWeight=1, PerfScore 8.50, gcrefRegs=0000 {}, byre
fRegs=0000 {}, BB01 [0000], byref, isz

IN0001: 000008      uxtb    w0, w0
IN0002: 00000C      cmp     w0, #32
IN0003: 000010      bhs     G_M17772_IG04
IN0004: 000014      movi    v1.4s, #0
IN0005: 000018      movi    v0.4s, #0
IN0006: 00001C      movz    x1, #0xD3E0      // code for System.Runtime.Intrinsics.Arm.Sve2:ShiftLeftAn
dInsert(System.Numerics.Vector`1[int],System.Numerics.Vector`1[int],ubyte):System.Numerics.Vector`1[int
]
IN0007: 000020      movk    x1, #0x3E6F LSL #16
IN0008: 000024      movk    x1, #0xE2E7 LSL #32
IN0009: 000028      ldr     x1, [x1]
IN000a: 00002C      blr     x1

G_M17772_IG03:        ; offs=0x000030, size=0x0008, bbWeight=1, PerfScore 2.00, epilog, nogc, extend

IN000f: 000030      ldp     fp, lr, [sp], #0x10
IN0010: 000034      ret     lr

G_M17772_IG04:        ; offs=0x000038, size=0x0008, bbWeight=0, PerfScore 0.00, gcVars=0000000000000000
 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, BB02 [0001], gcvars, byref

IN000b: 000038      bl      CORINFO_HELP_THROW_ARGUMENTOUTOFRANGEEXCEPTION
IN000c: 00003C      brk     #0

and this really big function is called for ShiftLeftAndInsert:

; Total bytes of code 300, prolog size 8, PerfScore 71.00, instruction count 75, allocated bytes for co
de 300 (MethodHash=0b353a25) for method System.Runtime.Intrinsics.Arm.Sve2:ShiftLeftAndInsert(System.Nu
merics.Vector`1[int],System.Numerics.Vector`1[int],ubyte):System.Numerics.Vector`1[int] (FullOpts)
; ============================================================

*************** After end code gen, before unwindEmit()
G_M50650_IG01:        ; func=00, offs=0x000000, size=0x0008, bbWeight=1, PerfScore 1.50, gcrefRegs=0000
 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG

IN0048: 000000      stp     fp, lr, [sp, #-0x10]!
IN0049: 000004      mov     fp, sp

G_M50650_IG02:        ; offs=0x000008, size=0x0018, bbWeight=1, PerfScore 4.50, gcrefRegs=0000 {}, byre
fRegs=0000 {}, BB01 [0000], byref, isz

IN0001: 000008      uxtb    w0, w0
IN0002: 00000C      cmp     w0, #32
IN0003: 000010      bhs     G_M50650_IG36
IN0004: 000014      adr     x1, [G_M50650_IG03]
IN0005: 000018      add     x1, x1, x0,  LSL #3
IN0006: 00001C      br      x1

G_M50650_IG03:        ; offs=0x000020, size=0x0008, bbWeight=1, PerfScore 2.00, BB01 [0000], extend

IN0007: 000020      sli     z0.s, z1.s, #0
IN0008: 000024      b       G_M50650_IG35

G_M50650_IG04:        ; offs=0x000028, size=0x0008, bbWeight=1, PerfScore 2.00, BB01 [0000], extend

IN0009: 000028      sli     z0.s, z1.s, #1
IN000a: 00002C      b       G_M50650_IG35

G_M50650_IG05:        ; offs=0x000030, size=0x0008, bbWeight=1, PerfScore 2.00, BB01 [0000], extend

IN000b: 000030      sli     z0.s, z1.s, #2
IN000c: 000034      b       G_M50650_IG35

G_M50650_IG06:        ; offs=0x000038, size=0x0008, bbWeight=1, PerfScore 2.00, BB01 [0000], extend

IN000d: 000038      sli     z0.s, z1.s, #3
IN000e: 00003C      b       G_M50650_IG35

G_M50650_IG07:        ; offs=0x000040, size=0x0008, bbWeight=1, PerfScore 2.00, BB01 [0000], extend

IN000f: 000040      sli     z0.s, z1.s, #4
IN0010: 000044      b       G_M50650_IG35

... and so on up to #31 ...

G_M50650_IG35:        ; offs=0x00011C, size=0x0008, bbWeight=1, PerfScore 2.00, epilog, nogc, extend

IN004a: 00011C      ldp     fp, lr, [sp], #0x10
IN004b: 000120      ret     lr

G_M50650_IG36:        ; offs=0x000124, size=0x0008, bbWeight=0, PerfScore 0.00, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, BB02 [0001], gcvars, byref

IN0046: 000124      bl      CORINFO_HELP_THROW_ARGUMENTOUTOFRANGEEXCEPTION
IN0047: 000128      brk     #0

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kunalspathak kunalspathak merged commit 60e900b into dotnet:main May 21, 2025
157 checks passed
SimaTian pushed a commit that referenced this pull request May 27, 2025
* Implement SVE2 ShiftLeftAndInsert

* Remove explicit immediate bounds setting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Runtime.Intrinsics arm-sve Work related to arm64 SVE/SVE2 support community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants