Skip to content

Add Optimized and HOL Light verified AVX2 Keccak x4#3020

Open
manastasova wants to merge 2 commits intoaws:mainfrom
manastasova:avx2_keccak_x4_hollight
Open

Add Optimized and HOL Light verified AVX2 Keccak x4#3020
manastasova wants to merge 2 commits intoaws:mainfrom
manastasova:avx2_keccak_x4_hollight

Conversation

@manastasova
Copy link
Contributor

@manastasova manastasova commented Feb 20, 2026

Issues:

Import AVX2 Optimized and HOL Light verified 4x Keccak permutation awslabs/s2n-bignum#354

NOTE:: Once awslabs/s2n-bignum#354 is merged, the assembly files would be imported directly with the importer script.

Description of changes:

This PR introduces an optimized AVX2 implementation of the Keccak-f[1600] x4 permutation, formally verified using HOL Light. This batched Keccak implementation processes four independent Keccak permutations in parallel using AVX2 SIMD instructions, significantly accelerating the core hash operations underlying ML-KEM (FIPS 203) and ML-DSA (FIPS 204).

The 4-way parallel Keccak permutation is a critical building block for lattice-based cryptographic schemes, as it is heavily used in:

  • ML-KEM: Matrix/vector sampling, seed expansion, and hash operations during keygen, encapsulation, and decapsulation
  • ML-DSA: Key generation, signing (rejection sampling), and verification

Performance Results

The optimization delivers substantial throughput improvements across all tested EC2 instance types:

Average Speedups by Algorithm Family:

Algorithm c7i c7a c6i c6a
ML-KEM-512 +29.2% +38.7% +41.4% +37.6%
ML-KEM-768 +29.3% +37.6% +37.4% +37.4%
ML-KEM-1024 +34.8% +46.7% +51.0% +48.4%
MLDSA44 +16.6% +17.9% +23.0% +21.1%
MLDSA65 +19.6% +18.4% +23.9% +20.8%
MLDSA87 +28.5% +28.0% +34.7% +31.9%

Notable highlights:

  • Peak speedup of +59.0% for ML-KEM-1024 keygen on c6i
  • ML-KEM-1024 benefits the most across all platforms (up to +52.7% on c7a), as larger parameter sets invoke more Keccak calls
  • MLDSA signing shows modest gains (+2–13%) since its runtime is dominated by rejection sampling rather than Keccak permutation throughput
  • Improvements are consistent across both Intel and AMD platforms and across both current (Gen 7) and previous generation (Gen 6) instances

Call-outs:

  • Reviewers should pay attention to the integration points where the new Keccak x4 is wired into the ML-KEM and ML-DSA call paths

Testing:

  • All existing ML-KEM and ML-DSA KAT (Known Answer Tests) pass, confirming functional correctness
    ./crypto/crypto_test
  • Performance benchmarked using ./tool/bssl speed on four EC2 instance types (c7i, c7a, c6i, c6a) to validate throughput improvements
    tool/bssl speed -filter "ML-KEM"
    tool/bssl speed -filter "MLDSA"

More Performance Data

EC2 c7i

Algorithm Operation Original (ops/sec) New (ops/sec) Speedup
ML-KEM-512 keygen 102,039.0 137,883.4 +35.1%
ML-KEM-512 encaps 92,432.1 118,961.3 +28.7%
ML-KEM-512 decaps 77,155.5 95,523.5 +23.8%
ML-KEM-768 keygen 65,240.3 86,148.5 +32.1%
ML-KEM-768 encaps 60,583.8 79,416.7 +31.1%
ML-KEM-768 decaps 51,275.5 64,007.9 +24.8%
ML-KEM-1024 keygen 43,752.6 62,079.1 +41.9%
ML-KEM-1024 encaps 40,528.9 54,745.9 +35.1%
ML-KEM-1024 decaps 35,182.9 44,833.4 +27.4%
MLDSA44 keygen 19,594.8 23,784.0 +21.4%
MLDSA44 signing 4,776.1 5,105.4 +6.9%
MLDSA44 verify 18,485.0 22,439.7 +21.4%
MLDSA65 keygen 10,078.2 12,485.9 +23.9%
MLDSA65 signing 3,030.3 3,263.0 +7.7%
MLDSA65 verify 11,629.3 14,807.7 +27.3%
MLDSA87 keygen 7,177.4 9,908.2 +38.0%
MLDSA87 signing 2,534.6 2,776.0 +9.5%
MLDSA87 verify 7,049.3 9,737.1 +38.1%

EC2 c7a

Algorithm Operation Original (ops/sec) New (ops/sec) Speedup
ML-KEM-512 keygen 94,563.7 137,392.5 +45.3%
ML-KEM-512 encaps 85,020.5 118,473.0 +39.3%
ML-KEM-512 decaps 71,284.2 93,645.4 +31.4%
ML-KEM-768 keygen 56,037.5 79,772.7 +42.4%
ML-KEM-768 encaps 52,744.2 73,353.1 +39.1%
ML-KEM-768 decaps 44,832.4 58,874.9 +31.3%
ML-KEM-1024 keygen 37,007.2 56,511.5 +52.7%
ML-KEM-1024 encaps 34,843.2 51,659.7 +48.3%
ML-KEM-1024 decaps 30,052.9 41,833.0 +39.2%
MLDSA44 keygen 17,087.6 21,781.5 +27.5%
MLDSA44 signing 3,833.2 3,941.9 +2.8%
MLDSA44 verify 15,055.9 18,594.2 +23.5%
MLDSA65 keygen 9,295.8 11,665.2 +25.5%
MLDSA65 signing 2,418.1 2,468.3 +2.1%
MLDSA65 verify 9,658.5 12,321.2 +27.6%
MLDSA87 keygen 6,458.0 9,079.5 +40.6%
MLDSA87 signing 2,021.0 2,147.5 +6.3%
MLDSA87 verify 6,094.7 8,355.6 +37.1%

EC2 c6i

Algorithm Operation Original (ops/sec) New (ops/sec) Speedup
ML-KEM-512 keygen 74,243.5 110,577.2 +48.9%
ML-KEM-512 encaps 66,855.4 94,949.4 +42.0%
ML-KEM-512 decaps 56,641.7 75,435.7 +33.2%
ML-KEM-768 keygen 44,861.7 63,758.2 +42.1%
ML-KEM-768 encaps 42,938.1 59,149.7 +37.7%
ML-KEM-768 decaps 35,953.7 47,646.6 +32.5%
ML-KEM-1024 keygen 30,333.7 48,230.3 +59.0%
ML-KEM-1024 encaps 28,685.5 43,577.8 +51.9%
ML-KEM-1024 decaps 23,941.1 33,996.0 +42.0%
MLDSA44 keygen 14,708.0 19,526.7 +32.8%
MLDSA44 signing 3,488.7 3,693.8 +5.9%
MLDSA44 verify 13,581.8 17,702.9 +30.3%
MLDSA65 keygen 7,868.8 10,223.9 +29.9%
MLDSA65 signing 2,153.5 2,309.9 +7.3%
MLDSA65 verify 8,542.6 11,490.7 +34.5%
MLDSA87 keygen 5,428.8 8,082.0 +48.9%
MLDSA87 signing 1,819.2 1,973.5 +8.5%
MLDSA87 verify 5,258.6 7,708.2 +46.6%

EC2 c6a

Algorithm Operation Original (ops/sec) New (ops/sec) Speedup
ML-KEM-512 keygen 94,817.9 138,020.5 +45.6%
ML-KEM-512 encaps 87,240.8 120,129.4 +37.7%
ML-KEM-512 decaps 72,457.8 93,790.5 +29.4%
ML-KEM-768 keygen 60,954.7 87,137.1 +42.9%
ML-KEM-768 encaps 57,065.8 79,197.9 +38.8%
ML-KEM-768 decaps 47,498.6 62,029.9 +30.6%
ML-KEM-1024 keygen 41,115.4 64,320.3 +56.4%
ML-KEM-1024 encaps 38,250.1 57,234.5 +49.6%
ML-KEM-1024 decaps 32,493.6 45,190.5 +39.1%
MLDSA44 keygen 16,540.2 21,594.4 +30.6%
MLDSA44 signing 3,670.8 3,916.2 +6.7%
MLDSA44 verify 14,447.5 18,216.4 +26.1%
MLDSA65 keygen 8,977.5 11,496.0 +28.0%
MLDSA65 signing 2,346.1 2,422.6 +3.3%
MLDSA65 verify 9,377.8 12,294.7 +31.1%
MLDSA87 keygen 6,224.4 8,939.6 +43.6%
MLDSA87 signing 1,961.1 2,208.0 +12.6%
MLDSA87 verify 5,862.3 8,183.7 +39.6%

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.

@codecov-commenter
Copy link

codecov-commenter commented Feb 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.34%. Comparing base (c6d7b33) to head (ee4fc2f).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3020      +/-   ##
==========================================
- Coverage   78.51%   78.34%   -0.18%     
==========================================
  Files         689      689              
  Lines      121018   121019       +1     
  Branches    16999    16969      -30     
==========================================
- Hits        95021    94809     -212     
- Misses      25097    25315     +218     
+ Partials      900      895       -5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sgmenda
Copy link
Contributor

sgmenda commented Mar 6, 2026

@manastasova is this ready for review or waiting on the s2n-bignum merge?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants