Import ML-KEM from mlkem-native/PQ code package #2041

bhess · 2025-01-13T13:34:07Z

This PR tracks the integration of ML-KEM from the mlkem-native upstream repository.
It replaces the current ML-KEM implementation in liboqs, which was previously imported from pq-crystals, with the mlkem-native implementation from PQCP.

Some features of mlkem-native:

Portable C implementation (C90 compliant)
Optimized implementation for x86_64
Optimized implementation for ARM64
Formal verification

The upstream code recently had a v1.0.0-alpha release and is actively maintained. The goal is to synchronize the PR with an upcoming tagged release of mlkem-native.

Additionally, the upstream code includes enhanced key validation as defined by FIPS 203 by default, which resolves issue #1951.

Closes #1951.

TODOs:

Sync with the upcoming release version of mlkem-native
Update constant-time tests
Update documentation

Does this PR change the input/output behaviour of a cryptographic algorithm (i.e., does it change known answer test values)? (If so, a version bump will be required from x.y.z to x.(y+1).0.)
Does this PR change the list of algorithms available -- either adding, removing, or renaming? Does this PR otherwise change an API? (If so, PRs in fully supported downstream projects dependent on these, i.e., oqs-provider will also need to be ready for review and merge by the time this is merged.)

.CMake/alg_support.cmake

baentsch

Thanks for the PR, @bhess. I surely didn't check all 540 files but focused on the integration logic: Please see the single comments. In general, the patch is way too large in my opinion: Isn't it possible that the upstream uses fewer hard-coded include paths and also provides a YML documentation of their implementation? "copy_from_upstream" ideally should be easy to run to regularly follow the upstream without the need to always create new patches: the latter only creates unnecessary work for OQS and consequently reduces the motivation for keeping the code up-to-date. Of course, if there is no further development expected in PQCP (is it?) this point is moot.

docs/cbom.json

scripts/copy_from_upstream/patches/mlkem-native.patch

tests/constant_time/kem/passes/ml_kem

scripts/copy_from_upstream/patches/mlkem-native.patch

bhess · 2025-01-22T15:57:23Z

Thanks for the review @baentsch. The patch size is now much reduced, basically only to adapt a few things to be able to use our fips202/sha3 implementation. For the upstream implementation it seems not straight-forward to move away from relative import paths. However, this is no longer an issue because I’ve added an option to copy_from_upstream that preserves the upstream folder structure. As a result, no further patching is required.

Comments addressed. Discussion ongoing. Don't want to hinder other approvals moving things forward.

SWilson4

I haven't attempted to review the code imported from PQCP (and I wouldn't have the expertise to do so anyhow), but the integration-related code looks good to me.

SWilson4 · 2025-01-23T15:07:10Z

src/common/pqclean_shims/fips202.h

Maybe it's time to rename this file to "upstream_shims" or something similar to reflect the fact that it's no longer exclusive to PQClean?

@SWilson4 In my "final" review I came across this unresolved comment: As you opened it, please close it before merge -- I guess by/before adding a separate issue so that your proposal above doesn't get forgotten.

baentsch · 2025-01-24T07:45:54Z

Quick additional question: Could you share a performance comparison run on the same machine "before-after", @bhess? Should help avoid things like #2047. Thanks in advance!

bhess · 2025-01-24T09:33:18Z

Quick additional question: Could you share a performance comparison run on the same machine "before-after", @bhess? Should help avoid things like #2047. Thanks in advance!

The following measurements are on an Intel Xeon Gold 6338 CPU @ 2.00GHz, Turbo Boost turned off for consistent results:

Generic implementation

Configuration info
==================
Target platform:  x86_64-Linux-6.1.19
Compiler:         gcc (11.4.0)
Compile options:  [-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.12.1-dev (major: 0, minor: 12, patch: 1, pre-release: -dev)
OpenSSL enabled:  Yes (OpenSSL 3.4.0 22 Oct 2024)
AES:              OpenSSL
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_LIBJADE_BUILD OQS_OPT_TARGET=generic CMAKE_BUILD_TYPE=Release 
CPU exts compile-time:  SSE SSE2

1.1. Old implementation from main

Speed test
==========
Started at 2025-01-24 09:09:54
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
ML-KEM-512                           |            |                |                 |            |                           |           
keygen                               |      54132 |          3.000 |          55.420 |     10.625 |                    110759 |      21232
encaps                               |      47809 |          3.000 |          62.751 |      0.584 |                    125417 |        800
decaps                               |      37771 |          3.000 |          79.428 |      0.748 |                    158772 |       1211
ML-KEM-768                           |            |                |                 |            |                           |           
keygen                               |      33523 |          3.000 |          89.491 |     12.123 |                    178901 |      24227
encaps                               |      30855 |          3.000 |          97.232 |      0.715 |                    194379 |       1202
decaps                               |      24982 |          3.000 |         120.088 |      0.801 |                    240104 |       1376
ML-KEM-1024                          |            |                |                 |            |                           |           
keygen                               |      21620 |          3.000 |         138.764 |     14.656 |                    277453 |      29299
encaps                               |      20920 |          3.000 |         143.405 |      0.837 |                    286731 |       1475
decaps                               |      17398 |          3.000 |         172.439 |      0.858 |                    344799 |       1505

1.2 mlkem-native implementation

Started at 2025-01-24 09:11:46
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
ML-KEM-512                           |            |                |                 |            |                           |           
keygen                               |      70652 |          3.000 |          42.462 |      8.883 |                     84854 |      17746
encaps                               |      65996 |          3.000 |          45.458 |      0.582 |                     90836 |        678
decaps                               |      54439 |          3.000 |          55.108 |      0.509 |                    110144 |        808
ML-KEM-768                           |            |                |                 |            |                           |           
keygen                               |      43475 |          3.000 |          69.006 |     10.748 |                    137944 |      21482
encaps                               |      42360 |          3.000 |          70.823 |      0.783 |                    141566 |       1295
decaps                               |      35461 |          3.000 |          84.602 |      0.791 |                    169128 |       1312
ML-KEM-1024                          |            |                |                 |            |                           |           
keygen                               |      28875 |          3.000 |         103.900 |     12.978 |                    207728 |      25950
encaps                               |      28994 |          3.000 |         103.471 |      0.817 |                    206867 |       1409
decaps                               |      24693 |          3.000 |         121.496 |      0.869 |                    242923 |       1537

-> We see a nice speedup in the generic code

Optimized implementation (Intel)

Configuration info
==================
Target platform:  x86_64-Linux-6.1.19
Compiler:         gcc (11.4.0)
Compile options:  [-march=native;-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.12.1-dev (major: 0, minor: 12, patch: 1, pre-release: -dev)
OpenSSL enabled:  Yes (OpenSSL 3.4.0 22 Oct 2024)
AES:              NI
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_LIBJADE_BUILD OQS_OPT_TARGET=auto CMAKE_BUILD_TYPE=Release 
CPU exts compile-time:  ADX AES AVX AVX2 AVX512 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3

2.1 Old implementation from main:

Started at 2025-01-24 09:08:06
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
ML-KEM-512                           |            |                |                 |            |                           |           
keygen                               |     249319 |          3.000 |          12.033 |      4.876 |                     23992 |       9740
encaps                               |     237728 |          3.000 |          12.619 |      0.514 |                     25158 |        340
decaps                               |     262800 |          3.000 |          11.416 |      0.515 |                     22763 |        317
ML-KEM-768                           |            |                |                 |            |                           |           
keygen                               |     154496 |          3.000 |          19.418 |      5.696 |                     38764 |      11357
encaps                               |     156005 |          3.000 |          19.230 |      0.462 |                     38380 |        430
decaps                               |     163952 |          3.000 |          18.298 |      0.504 |                     36528 |        457
ML-KEM-1024                          |            |                |                 |            |                           |           
keygen                               |     116438 |          3.000 |          25.765 |      6.358 |                     51462 |      12682
encaps                               |     116221 |          3.000 |          25.813 |      0.479 |                     51538 |        526
decaps                               |     119327 |          3.000 |          25.141 |      0.530 |                     50212 |        853

2.2 mlkem-native implementation

Started at 2025-01-24 09:13:17
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
ML-KEM-512                           |            |                |                 |            |                           |           
keygen                               |     247062 |          3.000 |          12.143 |      4.799 |                     24213 |       9580
encaps                               |     181257 |          3.000 |          16.551 |      3.701 |                     33013 |       7367
decaps                               |     154251 |          3.000 |          19.449 |      0.740 |                     38825 |       1103
ML-KEM-768                           |            |                |                 |            |                           |           
keygen                               |     150155 |          3.000 |          19.979 |      5.763 |                     39889 |      11505
encaps                               |     141092 |          3.000 |          21.263 |      0.497 |                     42443 |        503
decaps                               |     112058 |          3.000 |          26.772 |      0.499 |                     53474 |        515
ML-KEM-1024                          |            |                |                 |            |                           |           
keygen                               |     113681 |          3.000 |          26.390 |      6.629 |                     52710 |      13232
encaps                               |     103268 |          3.000 |          29.051 |      0.358 |                     58029 |        532
decaps                               |      81610 |          3.000 |          36.760 |      0.545 |                     73446 |        652

-> The key generation performance is very similar, but there's some performance degradation in encapsulation/decapsulation. This can likely be attributed to the additional key checks implemented in mlkem-native to meet FIPS203 requirements, which are more noticeable in the otherwise optimized code. Feedback from @mkannwischer would be appreciated to confirm if this aligns with your expectations.

mkannwischer · 2025-01-24T14:59:45Z

Quick additional question: Could you share a performance comparison run on the same machine "before-after", @bhess? Should help avoid things like #2047. Thanks in advance!

The following measurements are on an Intel Xeon Gold 6338 CPU @ 2.00GHz, Turbo Boost turned off for consistent results:

Generic implementation

Configuration info
==================
Target platform:  x86_64-Linux-6.1.19
Compiler:         gcc (11.4.0)
Compile options:  [-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.12.1-dev (major: 0, minor: 12, patch: 1, pre-release: -dev)
OpenSSL enabled:  Yes (OpenSSL 3.4.0 22 Oct 2024)
AES:              OpenSSL
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_LIBJADE_BUILD OQS_OPT_TARGET=generic CMAKE_BUILD_TYPE=Release 
CPU exts compile-time:  SSE SSE2

1.1. Old implementation from main

Speed test
==========
Started at 2025-01-24 09:09:54
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
ML-KEM-512                           |            |                |                 |            |                           |           
keygen                               |      54132 |          3.000 |          55.420 |     10.625 |                    110759 |      21232
encaps                               |      47809 |          3.000 |          62.751 |      0.584 |                    125417 |        800
decaps                               |      37771 |          3.000 |          79.428 |      0.748 |                    158772 |       1211
ML-KEM-768                           |            |                |                 |            |                           |           
keygen                               |      33523 |          3.000 |          89.491 |     12.123 |                    178901 |      24227
encaps                               |      30855 |          3.000 |          97.232 |      0.715 |                    194379 |       1202
decaps                               |      24982 |          3.000 |         120.088 |      0.801 |                    240104 |       1376
ML-KEM-1024                          |            |                |                 |            |                           |           
keygen                               |      21620 |          3.000 |         138.764 |     14.656 |                    277453 |      29299
encaps                               |      20920 |          3.000 |         143.405 |      0.837 |                    286731 |       1475
decaps                               |      17398 |          3.000 |         172.439 |      0.858 |                    344799 |       1505

1.2 mlkem-native implementation

Started at 2025-01-24 09:11:46
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
ML-KEM-512                           |            |                |                 |            |                           |           
keygen                               |      70652 |          3.000 |          42.462 |      8.883 |                     84854 |      17746
encaps                               |      65996 |          3.000 |          45.458 |      0.582 |                     90836 |        678
decaps                               |      54439 |          3.000 |          55.108 |      0.509 |                    110144 |        808
ML-KEM-768                           |            |                |                 |            |                           |           
keygen                               |      43475 |          3.000 |          69.006 |     10.748 |                    137944 |      21482
encaps                               |      42360 |          3.000 |          70.823 |      0.783 |                    141566 |       1295
decaps                               |      35461 |          3.000 |          84.602 |      0.791 |                    169128 |       1312
ML-KEM-1024                          |            |                |                 |            |                           |           
keygen                               |      28875 |          3.000 |         103.900 |     12.978 |                    207728 |      25950
encaps                               |      28994 |          3.000 |         103.471 |      0.817 |                    206867 |       1409
decaps                               |      24693 |          3.000 |         121.496 |      0.869 |                    242923 |       1537

-> We see a nice speedup in the generic code

Optimized implementation (Intel)

Configuration info
==================
Target platform:  x86_64-Linux-6.1.19
Compiler:         gcc (11.4.0)
Compile options:  [-march=native;-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.12.1-dev (major: 0, minor: 12, patch: 1, pre-release: -dev)
OpenSSL enabled:  Yes (OpenSSL 3.4.0 22 Oct 2024)
AES:              NI
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_LIBJADE_BUILD OQS_OPT_TARGET=auto CMAKE_BUILD_TYPE=Release 
CPU exts compile-time:  ADX AES AVX AVX2 AVX512 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3

2.1 Old implementation from main:

Started at 2025-01-24 09:08:06
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
ML-KEM-512                           |            |                |                 |            |                           |           
keygen                               |     249319 |          3.000 |          12.033 |      4.876 |                     23992 |       9740
encaps                               |     237728 |          3.000 |          12.619 |      0.514 |                     25158 |        340
decaps                               |     262800 |          3.000 |          11.416 |      0.515 |                     22763 |        317
ML-KEM-768                           |            |                |                 |            |                           |           
keygen                               |     154496 |          3.000 |          19.418 |      5.696 |                     38764 |      11357
encaps                               |     156005 |          3.000 |          19.230 |      0.462 |                     38380 |        430
decaps                               |     163952 |          3.000 |          18.298 |      0.504 |                     36528 |        457
ML-KEM-1024                          |            |                |                 |            |                           |           
keygen                               |     116438 |          3.000 |          25.765 |      6.358 |                     51462 |      12682
encaps                               |     116221 |          3.000 |          25.813 |      0.479 |                     51538 |        526
decaps                               |     119327 |          3.000 |          25.141 |      0.530 |                     50212 |        853

2.2 mlkem-native implementation

Started at 2025-01-24 09:13:17
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
ML-KEM-512                           |            |                |                 |            |                           |           
keygen                               |     247062 |          3.000 |          12.143 |      4.799 |                     24213 |       9580
encaps                               |     181257 |          3.000 |          16.551 |      3.701 |                     33013 |       7367
decaps                               |     154251 |          3.000 |          19.449 |      0.740 |                     38825 |       1103
ML-KEM-768                           |            |                |                 |            |                           |           
keygen                               |     150155 |          3.000 |          19.979 |      5.763 |                     39889 |      11505
encaps                               |     141092 |          3.000 |          21.263 |      0.497 |                     42443 |        503
decaps                               |     112058 |          3.000 |          26.772 |      0.499 |                     53474 |        515
ML-KEM-1024                          |            |                |                 |            |                           |           
keygen                               |     113681 |          3.000 |          26.390 |      6.629 |                     52710 |      13232
encaps                               |     103268 |          3.000 |          29.051 |      0.358 |                     58029 |        532
decaps                               |      81610 |          3.000 |          36.760 |      0.545 |                     73446 |        652

-> The key generation performance is very similar, but there's some performance degradation in encapsulation/decapsulation. This can likely be attributed to the additional key checks implemented in mlkem-native to meet FIPS203 requirements, which are more noticeable in the otherwise optimized code. Feedback from @mkannwischer would be appreciated to confirm if this aligns with your expectations.

Thanks for the benchmarks.
No, this is weird. The performance impact of input validation is expected to be around 1% for encaps and maybe 20% for decaps. That doesn't match what you are seeing, so something else must be going on in addition.
I was able to reproduce some of the weirdness you are seeing on a Cascade Lake just now. I will get back to you when I found out what's going on there.

hanno-becker · 2025-01-28T09:43:36Z

Apologies for the delay, we did some analysis and experiments in the background to understand the performance numbers reported by @bhess better.

The impact of input validation is surprisingly large: We see up to 5% for encapsulation and 30% for decapsulation. Still, it does not explain all of the performance drop.

mlkem-native is not adopting all AVX2 code from the pqcrystals repo, a conscious decision to limit the verification burden. In light of the above numbers, however, we revisited what merits implementing in intrinsics, and imported AVX2 intrinsics code for polynomial (de)compression routines from pqcrystals (with some robustness improvements).

Below are the current performance numbers of OQS main, mlkem-native main without IV, and mlkem-native main with IV. They are measured on a c6i.metal (Icelake) with Turbo Boost disabled, one of the platforms where we observed particularly large performance deviations compared to pqcrystals.

The performance is now close to the pqcrystals performance when discounting input validation. @bhess Could you also re-run the benchmarks on your machine?

=======================================
======== liboqs `main` ================
=======================================

Configuration info
==================
Target platform:  x86_64-Linux-6.8.0-1021-aws
Compiler:         gcc (13.3.0)
Compile options:  [-march=native;-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.12.1-dev (major: 0, minor: 12, patch: 1, pre-release: -dev)
Git commit:       6a16ac68b59423b4b0068f9faf97cc5162f6d453
OpenSSL enabled:  Yes (OpenSSL 3.0.13 30 Jan 2024)
AES:              NI
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_LIBJADE_BUILD OQS_OPT_TARGET=auto CMAKE_BUILD_TYPE=Release
CPU exts compile-time:  ADX AES AVX AVX2 AVX512 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3


Operation    | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean | pop. stdev
------------ | ----------:| --------------:| ---------------:| ----------:| ----------------:| ----------:
ML-KEM-512   |            |                |                 |            |                  |
keygen       |     368161 |          3.000 |           8.149 |      2.537 |            23565 |       7297
encaps       |     348400 |          3.000 |           8.611 |      0.539 |            24896 |        657
decaps       |     387285 |          3.000 |           7.746 |      0.485 |            22397 |        579
------------ | ----------:| --------------:| ---------------:| ----------:| ----------------:| ----------:
ML-KEM-768   |            |                |                 |            |                  |
keygen       |     225244 |          3.000 |          13.319 |      3.093 |            38559 |       8888
encaps       |     227425 |          3.000 |          13.191 |      0.436 |            38179 |        661
decaps       |     241123 |          3.000 |          12.442 |      0.534 |            36014 |        607
------------ | ----------:| --------------:| ---------------:| ----------:| ----------------:| ----------:
ML-KEM-1024  |            |                |                 |            |                  |
keygen       |     171524 |          3.000 |          17.490 |      3.635 |            50656 |      10459
encaps       |     170472 |          3.000 |          17.598 |      0.587 |            50949 |        934
decaps       |     175943 |          3.000 |          17.051 |      0.349 |            49382 |        852

==================================================================================
======== mlkem-native `test_no_iv`, (= `main` without input validation) ==========
==================================================================================

Configuration info
==================
Target platform:  x86_64-Linux-6.8.0-1021-aws
Compiler:         gcc (13.3.0)
Compile options:  [-march=native;-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.12.1-dev (major: 0, minor: 12, patch: 1, pre-release: -dev)
Git commit:       644a7ceeeb8c254fc56c3f7b81be27e2c0e38551 (+ local modifications)
OpenSSL enabled:  Yes (OpenSSL 3.0.13 30 Jan 2024)
AES:              NI
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_LIBJADE_BUILD OQS_OPT_TARGET=auto CMAKE_BUILD_TYPE=Release
CPU exts compile-time:  ADX AES AVX AVX2 AVX512 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3

Operation   | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean  | pop. stdev
----------- | ----------:| --------------:| ---------------:| ----------:| ------------------| ----------:
ML-KEM-512  |            |                |                 |            |                   |
keygen      |     357424 |          3.000 |           8.393 |      2.580 |             24273 |       7357
encaps      |     327128 |          3.000 |           9.171 |      0.401 |             26524 |        498
decaps      |     358303 |          3.000 |           8.373 |      0.505 |             24215 |        479
----------- | ----------:| --------------:| ---------------:| ----------:| -----------------:| ----------:
ML-KEM-768  |            |                |                 |            |                   |
keygen      |     220019 |          3.000 |          13.635 |      3.194 |             39476 |       9155
encaps      |     217060 |          3.000 |          13.821 |      0.456 |             40004 |        632
decaps      |     225688 |          3.000 |          13.293 |      0.494 |             38482 |        639
----------- | ----------:| --------------:| ---------------:| ----------:| -----------------:| ----------:
ML-KEM-1024 |            |                |                 |            |                   |
keygen      |     167219 |          3.000 |          17.941 |      3.571 |             51958 |      10298
encaps      |     162798 |          3.000 |          18.428 |      0.554 |             53358 |        784
decaps      |     166341 |          3.000 |          18.035 |      0.300 |             52234 |        681

=======================================
======== mlkem-native `main` ==========
=======================================

Configuration info
==================
Target platform:  x86_64-Linux-6.8.0-1021-aws
Compiler:         gcc (13.3.0)
Compile options:  [-march=native;-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.12.1-dev (major: 0, minor: 12, patch: 1, pre-release: -dev)
Git commit:       644a7ceeeb8c254fc56c3f7b81be27e2c0e38551 (+ local modifications)
OpenSSL enabled:  Yes (OpenSSL 3.0.13 30 Jan 2024)
AES:              NI
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_LIBJADE_BUILD OQS_OPT_TARGET=auto CMAKE_BUILD_TYPE=Release
CPU exts compile-time:  ADX AES AVX AVX2 AVX512 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3

Operation    | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean  | pop. stdev
------------ | ----------:| --------------:| ---------------:| ----------:| -----------------:| ----------:
ML-KEM-512   |            |                |                 |            |                   |
keygen       |     355541 |          3.000 |           8.438 |      2.472 |             24358 |       7030
encaps       |     325140 |          3.000 |           9.227 |      0.490 |             26654 |        803
decaps       |     275036 |          3.000 |          10.908 |      0.372 |             31520 |        573
------------ | ----------:| --------------:| ---------------:| ----------:| -----------------:| ----------:
ML-KEM-768   |            |                |                 |            |                   |
keygen       |     218869 |          3.000 |          13.707 |      3.131 |             39682 |       8980
encaps       |     214116 |          3.000 |          14.011 |      0.285 |             40554 |        642
decaps       |     175645 |          3.000 |          17.080 |      0.323 |             49464 |        663
------------ | ----------:| --------------:| ---------------:| ----------:| -----------------:| ----------:
ML-KEM-1024  |            |                |                 |            |                   |
keygen       |     167744 |          3.000 |          17.884 |      3.697 |             51794 |      10653
encaps       |     160844 |          3.000 |          18.652 |      0.554 |             54000 |        787
decaps       |     130757 |          3.000 |          22.943 |      0.405 |             66465 |        788

Signed-off-by: Basil Hess <[email protected]> Pull update [full tests] Signed-off-by: Basil Hess <[email protected]> Update CT files [full tests] [extended tests] Signed-off-by: Basil Hess <[email protected]> Update filter_algs [full tests] Signed-off-by: Basil Hess <[email protected]> Switch to upstream repo with patches [full tests] Signed-off-by: Basil Hess <[email protected]> Update README.md [skip ci] Signed-off-by: Basil Hess <[email protected]> New import Copy-from-upstream option to preserve folder stucture Smaller patch: no include paths fixing & meta-ymls available upstream Documenting ct-passes file Update dependencies for CBOM [full tests] [extended tests] Signed-off-by: Basil Hess <[email protected]> Correct upstream branch [full tests] [extended tests] Signed-off-by: Basil Hess <[email protected]> Pull new update with fips202 context initialization [full tests] [extended tests] Signed-off-by: Basil Hess <[email protected]> pull from upstream [full tests] [extended tests] Signed-off-by: Basil Hess <[email protected]>

bhess · 2025-01-28T12:08:05Z

Thank you @hanno-becker for the updates in mlkem-native and the thorough analysis. I've updated the PR to sync with commit 84398e7230fa31ba4241f5eb36bdc3c1dbbd5bcd from upstream.

The performance numbers on my machine are consistent with those on your Ice Lake instance.

Configuration info
==================
Target platform:  x86_64-Linux-6.1.19
Compiler:         gcc (11.4.0)
Compile options:  [-march=native;-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.12.1-dev (major: 0, minor: 12, patch: 1, pre-release: -dev)
Git commit:       e9f23a9d2fe12ace87d19a4cd13412e8340403bb (+ local modifications)
OpenSSL enabled:  Yes (OpenSSL 3.4.0 22 Oct 2024)
AES:              NI
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_LIBJADE_BUILD OQS_OPT_TARGET=auto CMAKE_BUILD_TYPE=Release 
CPU exts compile-time:  ADX AES AVX AVX2 AVX512 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3

Speed test
==========
Started at 2025-01-28 10:00:31
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
ML-KEM-512                           |            |                |                 |            |                           |           
keygen                               |     241471 |          3.000 |          12.424 |      4.979 |                     24774 |       9923
encaps                               |     222681 |          3.000 |          13.472 |      1.041 |                     26856 |       1831
decaps                               |     189715 |          3.000 |          15.813 |      0.448 |                     31554 |        414
ML-KEM-768                           |            |                |                 |            |                           |           
keygen                               |     150538 |          3.000 |          19.929 |      5.889 |                     39781 |      11756
encaps                               |     146821 |          3.000 |          20.433 |      0.546 |                     40790 |        478
decaps                               |     121462 |          3.000 |          24.699 |      0.528 |                     49327 |        504
ML-KEM-1024                          |            |                |                 |            |                           |           
keygen                               |     114708 |          3.000 |          26.153 |      6.907 |                     52233 |      13803
encaps                               |     110465 |          3.000 |          27.158 |      0.466 |                     54228 |        646
decaps                               |      89447 |          3.000 |          33.540 |      0.598 |                     66999 |        667

Regarding the performance drop due to input validation, I believe this is a tradeoff we should accept in order to align with the ML-KEM/FIPS 203 requirements. If we discount the overhead from input validation, the performance drop compared to pqcrystals is well under 10% for encapsulation/decapsulation, which is within the 15% tolerance defined in the liboqs release process. Also, this seems not to be a regression of the liboqs integration as the same characteristics is visible in standalone mlkem-native. Considering that the new implementation also benefits from formal verification, I think this tradeoff is well worth making. Any thoughts or objections on this?

Signed-off-by: Basil Hess <[email protected]>

hanno-becker · 2025-01-29T10:21:48Z

Thank you, @bhess! There are CI failures, but they seem unrelated to this PR?

@praveksharma @dstebila @baentsch @SWilson4 Anything else needed, or are we good to go?

bhess · 2025-01-29T11:26:58Z

There are CI failures, but they seem unrelated to this PR?

I rebased to main. It looks like the CUDA build is also failing on the main branch. @praveksharma, do you have any insights on why that might be?

praveksharma · 2025-01-29T12:24:40Z

I rebased to main. It looks like the CUDA build is also failing on the main branch. @praveksharma, do you have any insights on why that might be?

This is likely happening because of a misconfigured CI image becuase of which CMake cannot find the CUDA compiler; here is the relevant issue: #2056. The issue should not effect this PR.

baentsch · 2025-01-29T13:00:56Z

As a result, no further patching is required.

Hmm -- when looking at the current PR, I still see a patch file: How does this fit with the above statement @bhess?

docs/algorithms/kem/ml_kem.md

baentsch · 2025-01-29T13:13:35Z

@praveksharma @dstebila @baentsch @SWilson4 Anything else needed, or are we good to go?

Basically LGTM. Some (I hope only) housekeeping comments I'd suggest addressing before merge, tagging @bhess.

Thanks @hanno-becker @mkannwischer for helping us streamline the OQS "upstream zoo" a bit!

Signed-off-by: Basil Hess <[email protected]>

hanno-becker · 2025-01-29T15:30:30Z

Thank you, @baentsch, for the careful review -- good calls regarding the patch size and benchmarks!

bhess · 2025-01-29T15:51:40Z

Thank you for the review @baentsch !

As a result, no further patching is required.

Hmm -- when looking at the current PR, I still see a patch file: How does this fit with the above statement @bhess?

The context of my statement was the following:

The patch size is now much reduced, basically only to adapt a few things to be able to use our fips202/sha3 implementation...

The patch mainly adjusts the include paths from fips202/fips202.h to fips202.h to work with our common implementation. I considered this small change to be manageable as a patch, but you're right that the patch still exists in the PR. All further patches could be removed, though.

@hanno-becker: Once the PR receives a second approval, would it be possible to create a tag or release upstream? This way, we could reference the tag/release instead of pointing to a specific commit in the main branch, as we're doing currently.

hanno-becker · 2025-01-30T10:16:16Z

@bhess Yes, that will be fine. Once we have the second approval here, let's sync once more on what upstream commit to update this PR to, and then tag that.

hanno-becker · 2025-01-30T19:15:04Z

@bhess Could you update to the latest main of mlkem-native? Once it passes your CI as well, we'll tag it as v1.0.0-alpha2.

@baentsch @praveksharma @dstebila If you're happy, could one of you provide the 2nd approval?

Signed-off-by: Basil Hess <[email protected]>

hanno-becker · 2025-01-31T09:02:37Z

@bhess We created a tag v1.0.0-alpha.2. Could you update, hopefully a final time?

Signed-off-by: Basil Hess <[email protected]>

bhess · 2025-01-31T11:34:50Z

@bhess We created a tag v1.0.0-alpha.2. Could you update, hopefully a final time?

Thank you @hanno-becker! The PR is updated.

hanno-becker · 2025-02-01T04:42:29Z

@praveksharma @baentsch @dstebila Are we good to go?

docs/algorithms/kem/ml_kem.yml

bhess force-pushed the bhe-mlkem-native branch from 7f66f23 to 274d30c Compare January 13, 2025 13:50

bhess mentioned this pull request Jan 13, 2025

Integrate mlkem-native into libOQS pq-code-package/mlkem-native#653

Open

baentsch mentioned this pull request Jan 16, 2025

NVIDIA: Adding cuPQC as a backend for ML-KEM. #2044

Merged

2 tasks

planetf1 mentioned this pull request Jan 16, 2025

POC of providing algorithm for liboqs pq-code-package/tsc#103

Open

SWilson4 reviewed Jan 20, 2025

View reviewed changes

.CMake/alg_support.cmake Show resolved Hide resolved

bhess force-pushed the bhe-mlkem-native branch from 959c697 to 1eebf31 Compare January 21, 2025 12:54

bhess marked this pull request as ready for review January 21, 2025 16:24

bhess requested review from dstebila, baentsch, alexrow and praveksharma as code owners January 21, 2025 16:24

baentsch previously requested changes Jan 21, 2025

View reviewed changes

docs/cbom.json Outdated Show resolved Hide resolved

scripts/copy_from_upstream/patches/mlkem-native.patch Outdated Show resolved Hide resolved

tests/constant_time/kem/passes/ml_kem Outdated Show resolved Hide resolved

bhess force-pushed the bhe-mlkem-native branch from cb07260 to d199bb3 Compare January 22, 2025 15:42

baentsch reviewed Jan 22, 2025

View reviewed changes

scripts/copy_from_upstream/patches/mlkem-native.patch Outdated Show resolved Hide resolved

hanno-becker mentioned this pull request Jan 22, 2025

FIPS202: Add shake128[x4]_init() to FIPS202 API pq-code-package/mlkem-native#686

Merged

baentsch mentioned this pull request Jan 23, 2025

Make CBOM actually document dependencies #2048

Open

baentsch self-requested a review January 23, 2025 08:12

SWilson4 reviewed Jan 23, 2025

View reviewed changes

SWilson4 approved these changes Jan 23, 2025

View reviewed changes

Commit after rebase [full tests] [extended tests]

e7fc57b

Signed-off-by: Basil Hess <[email protected]>

bhess force-pushed the bhe-mlkem-native branch from e9f23a9 to e7fc57b Compare January 29, 2025 08:51

bhess mentioned this pull request Jan 29, 2025

Extended API for More Efficient Key Validation #2060

Open

baentsch reviewed Jan 29, 2025

View reviewed changes

docs/algorithms/kem/ml_kem.md Outdated Show resolved Hide resolved

correct patch name for docs

bb69318

Signed-off-by: Basil Hess <[email protected]>

pull update [run tests] [extended tests]

df95edc

Signed-off-by: Basil Hess <[email protected]>

Pull v1.0.0-alpha.2 [full tests] [extended tests]

7a8ec42

Signed-off-by: Basil Hess <[email protected]>

baentsch mentioned this pull request Jan 31, 2025

Adding OpenSSL CLA requirement pq-code-package/tsc#113

Open

dstebila reviewed Feb 2, 2025

View reviewed changes

docs/algorithms/kem/ml_kem.yml Show resolved Hide resolved

dstebila approved these changes Feb 2, 2025

View reviewed changes

baentsch mentioned this pull request Feb 3, 2025

Add checks for ML-KEM keys #2009

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Import ML-KEM from mlkem-native/PQ code package #2041

Import ML-KEM from mlkem-native/PQ code package #2041

bhess commented Jan 13, 2025 •

edited

Loading

baentsch left a comment

bhess commented Jan 22, 2025

SWilson4 left a comment

SWilson4 Jan 23, 2025

baentsch Jan 29, 2025

baentsch commented Jan 24, 2025

bhess commented Jan 24, 2025

mkannwischer commented Jan 24, 2025

hanno-becker commented Jan 28, 2025

bhess commented Jan 28, 2025 •

edited

Loading

hanno-becker commented Jan 29, 2025

bhess commented Jan 29, 2025

praveksharma commented Jan 29, 2025

baentsch commented Jan 29, 2025

baentsch commented Jan 29, 2025

hanno-becker commented Jan 29, 2025

bhess commented Jan 29, 2025

hanno-becker commented Jan 30, 2025

hanno-becker commented Jan 30, 2025 •

edited

Loading

hanno-becker commented Jan 31, 2025

bhess commented Jan 31, 2025

hanno-becker commented Feb 1, 2025

Import ML-KEM from mlkem-native/PQ code package #2041

Are you sure you want to change the base?

Import ML-KEM from mlkem-native/PQ code package #2041

Conversation

bhess commented Jan 13, 2025 • edited Loading

baentsch left a comment

Choose a reason for hiding this comment

bhess commented Jan 22, 2025

SWilson4 left a comment

Choose a reason for hiding this comment

SWilson4 Jan 23, 2025

Choose a reason for hiding this comment

baentsch Jan 29, 2025

Choose a reason for hiding this comment

baentsch commented Jan 24, 2025

bhess commented Jan 24, 2025

mkannwischer commented Jan 24, 2025

hanno-becker commented Jan 28, 2025

bhess commented Jan 28, 2025 • edited Loading

hanno-becker commented Jan 29, 2025

bhess commented Jan 29, 2025

praveksharma commented Jan 29, 2025

baentsch commented Jan 29, 2025

baentsch commented Jan 29, 2025

hanno-becker commented Jan 29, 2025

bhess commented Jan 29, 2025

hanno-becker commented Jan 30, 2025

hanno-becker commented Jan 30, 2025 • edited Loading

hanno-becker commented Jan 31, 2025

bhess commented Jan 31, 2025

hanno-becker commented Feb 1, 2025

bhess commented Jan 13, 2025 •

edited

Loading

bhess commented Jan 28, 2025 •

edited

Loading

hanno-becker commented Jan 30, 2025 •

edited

Loading