Skip to content

Conversation

@mkannwischer
Copy link
Contributor

@mkannwischer mkannwischer commented Nov 28, 2025

iNTT speed-ups:

Platform Before (main) After Speedup
Mac Mini (M1) 434 427 1.016x
Graviton2 1970 1745 1.129x
Graviton3 876 690 1.270x
Graviton4 785 657 1.195x
Cortex-A76 (RPi 5) 1970 1744 1.130x
Cortex-A72 (RPi 4) 2441 1902 1.283x
Cortex-A55 (Snapdragon) 3519 2428 1.449x

@mkannwischer mkannwischer force-pushed the slothy-intt branch 2 times, most recently from b99e917 to 308107d Compare November 28, 2025 05:11
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks (opt)

Details
Benchmark suite Current: 308107d Previous: 2948ece Ratio
ML-DSA-44 keypair 46398 cycles 46414 cycles 1.00
ML-DSA-44 sign 131854 cycles 132001 cycles 1.00
ML-DSA-44 verify 47791 cycles 47801 cycles 1.00
ML-DSA-65 keypair 81320 cycles 81338 cycles 1.00
ML-DSA-65 sign 218021 cycles 218252 cycles 1.00
ML-DSA-65 verify 80057 cycles 80072 cycles 1.00
ML-DSA-87 keypair 132433 cycles 132477 cycles 1.00
ML-DSA-87 sign 279539 cycles 279832 cycles 1.00
ML-DSA-87 verify 130404 cycles 130427 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks (no-opt)

Details
Benchmark suite Current: 308107d Previous: 2948ece Ratio
ML-DSA-44 keypair 114369 cycles 114382 cycles 1.00
ML-DSA-44 sign 428771 cycles 428727 cycles 1.00
ML-DSA-44 verify 121521 cycles 121527 cycles 1.00
ML-DSA-65 keypair 196266 cycles 196236 cycles 1.00
ML-DSA-65 sign 697624 cycles 697597 cycles 1.00
ML-DSA-65 verify 196490 cycles 196451 cycles 1.00
ML-DSA-87 keypair 323099 cycles 323068 cycles 1.00
ML-DSA-87 sign 880192 cycles 880149 cycles 1.00
ML-DSA-87 verify 327024 cycles 326950 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i)

Details
Benchmark suite Current: 308107d Previous: 2948ece Ratio
ML-DSA-44 keypair 35497 cycles 35143 cycles 1.01
ML-DSA-44 sign 121073 cycles 121147 cycles 1.00
ML-DSA-44 verify 38211 cycles 38343 cycles 1.00
ML-DSA-65 keypair 63058 cycles 62033 cycles 1.02
ML-DSA-65 sign 201564 cycles 200225 cycles 1.01
ML-DSA-65 verify 63214 cycles 62992 cycles 1.00
ML-DSA-87 keypair 95083 cycles 94071 cycles 1.01
ML-DSA-87 sign 234938 cycles 230071 cycles 1.02
ML-DSA-87 verify 94549 cycles 95065 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i) (no-opt)

Details
Benchmark suite Current: 308107d Previous: 2948ece Ratio
ML-DSA-44 keypair 95910 cycles 95749 cycles 1.00
ML-DSA-44 sign 349210 cycles 348923 cycles 1.00
ML-DSA-44 verify 101723 cycles 101599 cycles 1.00
ML-DSA-65 keypair 163623 cycles 163400 cycles 1.00
ML-DSA-65 sign 565612 cycles 564714 cycles 1.00
ML-DSA-65 verify 166016 cycles 165902 cycles 1.00
ML-DSA-87 keypair 267621 cycles 267773 cycles 1.00
ML-DSA-87 sign 723411 cycles 723169 cycles 1.00
ML-DSA-87 verify 273113 cycles 272914 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)

Details
Benchmark suite Current: 308107d Previous: 2948ece Ratio
ML-DSA-44 keypair 276121 cycles 285745 cycles 0.97
ML-DSA-44 sign 825067 cycles 894122 cycles 0.92
ML-DSA-44 verify 273925 cycles 280893 cycles 0.98
ML-DSA-65 keypair 475121 cycles 486508 cycles 0.98
ML-DSA-65 sign 1365693 cycles 1463974 cycles 0.93
ML-DSA-65 verify 451767 cycles 465274 cycles 0.97
ML-DSA-87 keypair 805587 cycles 832477 cycles 0.97
ML-DSA-87 sign 1835576 cycles 2000183 cycles 0.92
ML-DSA-87 verify 773596 cycles 798546 cycles 0.97

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a)

Details
Benchmark suite Current: 308107d Previous: 2948ece Ratio
ML-DSA-44 keypair 69206 cycles 69206 cycles 1
ML-DSA-44 sign 184245 cycles 184387 cycles 1.00
ML-DSA-44 verify 69151 cycles 69106 cycles 1.00
ML-DSA-65 keypair 119169 cycles 119372 cycles 1.00
ML-DSA-65 sign 294909 cycles 295603 cycles 1.00
ML-DSA-65 verify 115188 cycles 115375 cycles 1.00
ML-DSA-87 keypair 203578 cycles 203802 cycles 1.00
ML-DSA-87 sign 388167 cycles 387905 cycles 1.00
ML-DSA-87 verify 195857 cycles 195698 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4

Details
Benchmark suite Current: 308107d Previous: 2948ece Ratio
ML-DSA-44 keypair 68739 cycles 69152 cycles 0.99
ML-DSA-44 sign 202889 cycles 208483 cycles 0.97
ML-DSA-44 verify 70765 cycles 71242 cycles 0.99
ML-DSA-65 keypair 121544 cycles 122178 cycles 0.99
ML-DSA-65 sign 331812 cycles 342020 cycles 0.97
ML-DSA-65 verify 117588 cycles 118474 cycles 0.99
ML-DSA-87 keypair 198973 cycles 199864 cycles 1.00
ML-DSA-87 sign 428789 cycles 439835 cycles 0.97
ML-DSA-87 verify 194463 cycles 195400 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i)

Details
Benchmark suite Current: 308107d Previous: 2948ece Ratio
ML-DSA-44 keypair 57596 cycles 57148 cycles 1.01
ML-DSA-44 sign 180276 cycles 179819 cycles 1.00
ML-DSA-44 verify 61330 cycles 61125 cycles 1.00
ML-DSA-65 keypair 99552 cycles 99980 cycles 1.00
ML-DSA-65 sign 295981 cycles 297406 cycles 1.00
ML-DSA-65 verify 101064 cycles 101235 cycles 1.00
ML-DSA-87 keypair 154022 cycles 154320 cycles 1.00
ML-DSA-87 sign 353635 cycles 354598 cycles 1.00
ML-DSA-87 verify 153339 cycles 153733 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2

Details
Benchmark suite Current: 308107d Previous: 2948ece Ratio
ML-DSA-44 keypair 115425 cycles 115728 cycles 1.00
ML-DSA-44 sign 364333 cycles 373939 cycles 0.97
ML-DSA-44 verify 119521 cycles 119966 cycles 1.00
ML-DSA-65 keypair 198063 cycles 199386 cycles 0.99
ML-DSA-65 sign 597460 cycles 615426 cycles 0.97
ML-DSA-65 verify 194953 cycles 196625 cycles 0.99
ML-DSA-87 keypair 324508 cycles 326647 cycles 0.99
ML-DSA-87 sign 761477 cycles 784067 cycles 0.97
ML-DSA-87 verify 320619 cycles 322593 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4 (no-opt)

Details
Benchmark suite Current: 308107d Previous: 2948ece Ratio
ML-DSA-44 keypair 128248 cycles 128410 cycles 1.00
ML-DSA-44 sign 457074 cycles 456811 cycles 1.00
ML-DSA-44 verify 136266 cycles 136364 cycles 1.00
ML-DSA-65 keypair 220925 cycles 220811 cycles 1.00
ML-DSA-65 sign 745590 cycles 746754 cycles 1.00
ML-DSA-65 verify 220422 cycles 220734 cycles 1.00
ML-DSA-87 keypair 365098 cycles 365323 cycles 1.00
ML-DSA-87 sign 944236 cycles 943162 cycles 1.00
ML-DSA-87 verify 369008 cycles 369319 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a) (no-opt)

Details
Benchmark suite Current: 308107d Previous: 2948ece Ratio
ML-DSA-44 keypair 135820 cycles 135683 cycles 1.00
ML-DSA-44 sign 540804 cycles 540540 cycles 1.00
ML-DSA-44 verify 148978 cycles 148890 cycles 1.00
ML-DSA-65 keypair 229064 cycles 228278 cycles 1.00
ML-DSA-65 sign 891984 cycles 889005 cycles 1.00
ML-DSA-65 verify 238549 cycles 237556 cycles 1.00
ML-DSA-87 keypair 373012 cycles 374889 cycles 0.99
ML-DSA-87 sign 1104955 cycles 1108915 cycles 1.00
ML-DSA-87 verify 387383 cycles 388708 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3

Details
Benchmark suite Current: 308107d Previous: 2948ece Ratio
ML-DSA-44 keypair 72610 cycles 73277 cycles 0.99
ML-DSA-44 sign 213227 cycles 221350 cycles 0.96
ML-DSA-44 verify 75647 cycles 76289 cycles 0.99
ML-DSA-65 keypair 128411 cycles 129514 cycles 0.99
ML-DSA-65 sign 353239 cycles 367951 cycles 0.96
ML-DSA-65 verify 125610 cycles 126662 cycles 0.99
ML-DSA-87 keypair 206984 cycles 210841 cycles 0.98
ML-DSA-87 sign 445951 cycles 467821 cycles 0.95
ML-DSA-87 verify 205858 cycles 206335 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i) (no-opt)

Details
Benchmark suite Current: 308107d Previous: 2948ece Ratio
ML-DSA-44 keypair 158807 cycles 159036 cycles 1.00
ML-DSA-44 sign 565120 cycles 565513 cycles 1.00
ML-DSA-44 verify 170104 cycles 170189 cycles 1.00
ML-DSA-65 keypair 270096 cycles 270337 cycles 1.00
ML-DSA-65 sign 925021 cycles 926558 cycles 1.00
ML-DSA-65 verify 276337 cycles 276745 cycles 1.00
ML-DSA-87 keypair 451464 cycles 451390 cycles 1.00
ML-DSA-87 sign 1182648 cycles 1184163 cycles 1.00
ML-DSA-87 verify 461290 cycles 461812 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2 (no-opt)

Details
Benchmark suite Current: 308107d Previous: 2948ece Ratio
ML-DSA-44 keypair 214615 cycles 214703 cycles 1.00
ML-DSA-44 sign 782904 cycles 782961 cycles 1.00
ML-DSA-44 verify 230602 cycles 230723 cycles 1.00
ML-DSA-65 keypair 385499 cycles 385351 cycles 1.00
ML-DSA-65 sign 1309947 cycles 1310117 cycles 1.00
ML-DSA-65 verify 376028 cycles 376231 cycles 1.00
ML-DSA-87 keypair 607490 cycles 607560 cycles 1.00
ML-DSA-87 sign 1655444 cycles 1656700 cycles 1.00
ML-DSA-87 verify 618074 cycles 618424 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a)

Details
Benchmark suite Current: 308107d Previous: 2948ece Ratio
ML-DSA-44 keypair 40979 cycles 41813 cycles 0.98
ML-DSA-44 sign 129104 cycles 129629 cycles 1.00
ML-DSA-44 verify 43279 cycles 43610 cycles 0.99
ML-DSA-65 keypair 72204 cycles 72392 cycles 1.00
ML-DSA-65 sign 211025 cycles 212110 cycles 0.99
ML-DSA-65 verify 73573 cycles 73873 cycles 1.00
ML-DSA-87 keypair 109362 cycles 109524 cycles 1.00
ML-DSA-87 sign 247835 cycles 249633 cycles 0.99
ML-DSA-87 verify 112117 cycles 110170 cycles 1.02

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3 (no-opt)

Details
Benchmark suite Current: 308107d Previous: 2948ece Ratio
ML-DSA-44 keypair 138756 cycles 138845 cycles 1.00
ML-DSA-44 sign 493651 cycles 493534 cycles 1.00
ML-DSA-44 verify 148309 cycles 148510 cycles 1.00
ML-DSA-65 keypair 242702 cycles 242325 cycles 1.00
ML-DSA-65 sign 808721 cycles 808829 cycles 1.00
ML-DSA-65 verify 240717 cycles 240967 cycles 1.00
ML-DSA-87 keypair 396445 cycles 396798 cycles 1.00
ML-DSA-87 sign 1027156 cycles 1026810 cycles 1.00
ML-DSA-87 verify 401766 cycles 402035 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a) (no-opt)

Details
Benchmark suite Current: 308107d Previous: 2948ece Ratio
ML-DSA-44 keypair 120131 cycles 121135 cycles 0.99
ML-DSA-44 sign 455762 cycles 455691 cycles 1.00
ML-DSA-44 verify 130384 cycles 130171 cycles 1.00
ML-DSA-65 keypair 204342 cycles 206078 cycles 0.99
ML-DSA-65 sign 734962 cycles 734493 cycles 1.00
ML-DSA-65 verify 208973 cycles 210555 cycles 0.99
ML-DSA-87 keypair 337202 cycles 337989 cycles 1.00
ML-DSA-87 sign 924298 cycles 922942 cycles 1.00
ML-DSA-87 verify 344714 cycles 345245 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)

Details
Benchmark suite Current: 308107d Previous: 2948ece Ratio
ML-DSA-44 keypair 827587 cycles 827337 cycles 1.00
ML-DSA-44 sign 3333337 cycles 3331425 cycles 1.00
ML-DSA-44 verify 920188 cycles 919913 cycles 1.00
ML-DSA-65 keypair 1402437 cycles 1404508 cycles 1.00
ML-DSA-65 sign 5443872 cycles 5442876 cycles 1.00
ML-DSA-65 verify 1470631 cycles 1469680 cycles 1.00
ML-DSA-87 keypair 2304223 cycles 2306569 cycles 1.00
ML-DSA-87 sign 6818211 cycles 6817332 cycles 1.00
ML-DSA-87 verify 2407130 cycles 2402780 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)

Details
Benchmark suite Current: 308107d Previous: 2948ece Ratio
ML-DSA-44 keypair 464921 cycles 464556 cycles 1.00
ML-DSA-44 sign 2212617 cycles 2208657 cycles 1.00
ML-DSA-44 verify 546747 cycles 545962 cycles 1.00
ML-DSA-65 keypair 779753 cycles 777602 cycles 1.00
ML-DSA-65 sign 3629690 cycles 3608012 cycles 1.01
ML-DSA-65 verify 850546 cycles 847136 cycles 1.00
ML-DSA-87 keypair 1255823 cycles 1253990 cycles 1.00
ML-DSA-87 sign 4472890 cycles 4443324 cycles 1.01
ML-DSA-87 verify 1361597 cycles 1361371 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)

Details
Benchmark suite Current: 308107d Previous: 2948ece Ratio
ML-DSA-44 keypair 114069 cycles 114842 cycles 0.99
ML-DSA-44 sign 361125 cycles 371794 cycles 0.97
ML-DSA-44 verify 118214 cycles 119310 cycles 0.99
ML-DSA-65 keypair 197806 cycles 199034 cycles 0.99
ML-DSA-65 sign 597002 cycles 614739 cycles 0.97
ML-DSA-65 verify 194692 cycles 196379 cycles 0.99
ML-DSA-87 keypair 323963 cycles 326202 cycles 0.99
ML-DSA-87 sign 760578 cycles 783241 cycles 0.97
ML-DSA-87 verify 320320 cycles 322445 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)

Details
Benchmark suite Current: 308107d Previous: 2948ece Ratio
ML-DSA-44 keypair 214154 cycles 213868 cycles 1.00
ML-DSA-44 sign 782106 cycles 784182 cycles 1.00
ML-DSA-44 verify 230079 cycles 230054 cycles 1.00
ML-DSA-65 keypair 384873 cycles 384979 cycles 1.00
ML-DSA-65 sign 1326370 cycles 1314553 cycles 1.01
ML-DSA-65 verify 375442 cycles 375792 cycles 1.00
ML-DSA-87 keypair 606660 cycles 606983 cycles 1.00
ML-DSA-87 sign 1652718 cycles 1654364 cycles 1.00
ML-DSA-87 verify 617702 cycles 618211 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)

Details
Benchmark suite Current: 308107d Previous: 2948ece Ratio
ML-DSA-44 keypair 227884 cycles 229441 cycles 0.99
ML-DSA-44 sign 640944 cycles 675710 cycles 0.95
ML-DSA-44 verify 231954 cycles 238273 cycles 0.97
ML-DSA-65 keypair 389453 cycles 405551 cycles 0.96
ML-DSA-65 sign 1045279 cycles 1085789 cycles 0.96
ML-DSA-65 verify 379960 cycles 383505 cycles 0.99
ML-DSA-87 keypair 655079 cycles 677320 cycles 0.97
ML-DSA-87 sign 1347200 cycles 1446952 cycles 0.93
ML-DSA-87 verify 633744 cycles 643185 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)

Details
Benchmark suite Current: 308107d Previous: 2948ece Ratio
ML-DSA-44 keypair 308894 cycles 313340 cycles 0.99
ML-DSA-44 sign 1203258 cycles 1210106 cycles 0.99
ML-DSA-44 verify 332723 cycles 345251 cycles 0.96
ML-DSA-65 keypair 581830 cycles 573758 cycles 1.01
ML-DSA-65 sign 1985838 cycles 2022886 cycles 0.98
ML-DSA-65 verify 547594 cycles 546429 cycles 1.00
ML-DSA-87 keypair 891983 cycles 880785 cycles 1.01
ML-DSA-87 sign 2521635 cycles 2523515 cycles 1.00
ML-DSA-87 verify 903880 cycles 908652 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

@mkannwischer mkannwischer marked this pull request as ready for review November 28, 2025 05:45
@mkannwischer mkannwischer requested a review from a team as a code owner November 28, 2025 05:45
@mkannwischer
Copy link
Contributor Author

We first need to merge slothy-optimizer/slothy#363 and then update the SLOTHY commit.

@mkannwischer
Copy link
Contributor Author

We first need to merge slothy-optimizer/slothy#363 and then update the SLOTHY commit.

It has been merged and I updated the SLOTHY commit.

Resolves #206

Signed-off-by: Matthias J. Kannwischer <[email protected]>
@mkannwischer mkannwischer merged commit 1915e47 into main Dec 1, 2025
332 checks passed
@mkannwischer mkannwischer deleted the slothy-intt branch December 1, 2025 03:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Run Neon NTT/iNTT through SLOTHY

4 participants