Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import ML-KEM from mlkem-native/PQ code package #2041

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

bhess
Copy link
Member

@bhess bhess commented Jan 13, 2025

This PR tracks the integration of ML-KEM from the mlkem-native upstream repository.
It replaces the current ML-KEM implementation in liboqs, which was previously imported from pq-crystals, with the mlkem-native implementation from PQCP.

Some features of mlkem-native:

  • Portable C implementation (C90 compliant)
  • Optimized implementation for x86_64
  • Optimized implementation for ARM64
  • Formal verification

The upstream code recently had a v1.0.0-alpha release and is actively maintained. The goal is to synchronize the PR with an upcoming tagged release of mlkem-native.

Additionally, the upstream code includes enhanced key validation as defined by FIPS 203 by default, which resolves issue #1951.

Closes #1951.

TODOs:

  • Sync with the upcoming release version of mlkem-native
  • Update constant-time tests
  • Update documentation
  • Does this PR change the input/output behaviour of a cryptographic algorithm (i.e., does it change known answer test values)? (If so, a version bump will be required from x.y.z to x.(y+1).0.)
  • Does this PR change the list of algorithms available -- either adding, removing, or renaming? Does this PR otherwise change an API? (If so, PRs in fully supported downstream projects dependent on these, i.e., oqs-provider will also need to be ready for review and merge by the time this is merged.)

@bhess bhess marked this pull request as ready for review January 21, 2025 16:24
Copy link
Member

@baentsch baentsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, @bhess. I surely didn't check all 540 files but focused on the integration logic: Please see the single comments. In general, the patch is way too large in my opinion: Isn't it possible that the upstream uses fewer hard-coded include paths and also provides a YML documentation of their implementation? "copy_from_upstream" ideally should be easy to run to regularly follow the upstream without the need to always create new patches: the latter only creates unnecessary work for OQS and consequently reduces the motivation for keeping the code up-to-date. Of course, if there is no further development expected in PQCP (is it?) this point is moot.

docs/cbom.json Outdated Show resolved Hide resolved
scripts/copy_from_upstream/patches/mlkem-native.patch Outdated Show resolved Hide resolved
tests/constant_time/kem/passes/ml_kem Outdated Show resolved Hide resolved
@bhess
Copy link
Member Author

bhess commented Jan 22, 2025

Thanks for the review @baentsch. The patch size is now much reduced, basically only to adapt a few things to be able to use our fips202/sha3 implementation. For the upstream implementation it seems not straight-forward to move away from relative import paths. However, this is no longer an issue because I’ve added an option to copy_from_upstream that preserves the upstream folder structure. As a result, no further patching is required.

@baentsch baentsch self-requested a review January 23, 2025 08:12
@baentsch baentsch dismissed their stale review January 23, 2025 08:13

Comments addressed. Discussion ongoing. Don't want to hinder other approvals moving things forward.

Copy link
Member

@SWilson4 SWilson4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't attempted to review the code imported from PQCP (and I wouldn't have the expertise to do so anyhow), but the integration-related code looks good to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's time to rename this file to "upstream_shims" or something similar to reflect the fact that it's no longer exclusive to PQClean?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SWilson4 In my "final" review I came across this unresolved comment: As you opened it, please close it before merge -- I guess by/before adding a separate issue so that your proposal above doesn't get forgotten.

@baentsch
Copy link
Member

Quick additional question: Could you share a performance comparison run on the same machine "before-after", @bhess? Should help avoid things like #2047. Thanks in advance!

@bhess
Copy link
Member Author

bhess commented Jan 24, 2025

Quick additional question: Could you share a performance comparison run on the same machine "before-after", @bhess? Should help avoid things like #2047. Thanks in advance!

The following measurements are on an Intel Xeon Gold 6338 CPU @ 2.00GHz, Turbo Boost turned off for consistent results:

  1. Generic implementation
Configuration info
==================
Target platform:  x86_64-Linux-6.1.19
Compiler:         gcc (11.4.0)
Compile options:  [-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.12.1-dev (major: 0, minor: 12, patch: 1, pre-release: -dev)
OpenSSL enabled:  Yes (OpenSSL 3.4.0 22 Oct 2024)
AES:              OpenSSL
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_LIBJADE_BUILD OQS_OPT_TARGET=generic CMAKE_BUILD_TYPE=Release 
CPU exts compile-time:  SSE SSE2

1.1. Old implementation from main

Speed test
==========
Started at 2025-01-24 09:09:54
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
ML-KEM-512                           |            |                |                 |            |                           |           
keygen                               |      54132 |          3.000 |          55.420 |     10.625 |                    110759 |      21232
encaps                               |      47809 |          3.000 |          62.751 |      0.584 |                    125417 |        800
decaps                               |      37771 |          3.000 |          79.428 |      0.748 |                    158772 |       1211
ML-KEM-768                           |            |                |                 |            |                           |           
keygen                               |      33523 |          3.000 |          89.491 |     12.123 |                    178901 |      24227
encaps                               |      30855 |          3.000 |          97.232 |      0.715 |                    194379 |       1202
decaps                               |      24982 |          3.000 |         120.088 |      0.801 |                    240104 |       1376
ML-KEM-1024                          |            |                |                 |            |                           |           
keygen                               |      21620 |          3.000 |         138.764 |     14.656 |                    277453 |      29299
encaps                               |      20920 |          3.000 |         143.405 |      0.837 |                    286731 |       1475
decaps                               |      17398 |          3.000 |         172.439 |      0.858 |                    344799 |       1505

1.2 mlkem-native implementation

Started at 2025-01-24 09:11:46
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
ML-KEM-512                           |            |                |                 |            |                           |           
keygen                               |      70652 |          3.000 |          42.462 |      8.883 |                     84854 |      17746
encaps                               |      65996 |          3.000 |          45.458 |      0.582 |                     90836 |        678
decaps                               |      54439 |          3.000 |          55.108 |      0.509 |                    110144 |        808
ML-KEM-768                           |            |                |                 |            |                           |           
keygen                               |      43475 |          3.000 |          69.006 |     10.748 |                    137944 |      21482
encaps                               |      42360 |          3.000 |          70.823 |      0.783 |                    141566 |       1295
decaps                               |      35461 |          3.000 |          84.602 |      0.791 |                    169128 |       1312
ML-KEM-1024                          |            |                |                 |            |                           |           
keygen                               |      28875 |          3.000 |         103.900 |     12.978 |                    207728 |      25950
encaps                               |      28994 |          3.000 |         103.471 |      0.817 |                    206867 |       1409
decaps                               |      24693 |          3.000 |         121.496 |      0.869 |                    242923 |       1537

-> We see a nice speedup in the generic code

  1. Optimized implementation (Intel)
Configuration info
==================
Target platform:  x86_64-Linux-6.1.19
Compiler:         gcc (11.4.0)
Compile options:  [-march=native;-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.12.1-dev (major: 0, minor: 12, patch: 1, pre-release: -dev)
OpenSSL enabled:  Yes (OpenSSL 3.4.0 22 Oct 2024)
AES:              NI
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_LIBJADE_BUILD OQS_OPT_TARGET=auto CMAKE_BUILD_TYPE=Release 
CPU exts compile-time:  ADX AES AVX AVX2 AVX512 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3

2.1 Old implementation from main:

Started at 2025-01-24 09:08:06
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
ML-KEM-512                           |            |                |                 |            |                           |           
keygen                               |     249319 |          3.000 |          12.033 |      4.876 |                     23992 |       9740
encaps                               |     237728 |          3.000 |          12.619 |      0.514 |                     25158 |        340
decaps                               |     262800 |          3.000 |          11.416 |      0.515 |                     22763 |        317
ML-KEM-768                           |            |                |                 |            |                           |           
keygen                               |     154496 |          3.000 |          19.418 |      5.696 |                     38764 |      11357
encaps                               |     156005 |          3.000 |          19.230 |      0.462 |                     38380 |        430
decaps                               |     163952 |          3.000 |          18.298 |      0.504 |                     36528 |        457
ML-KEM-1024                          |            |                |                 |            |                           |           
keygen                               |     116438 |          3.000 |          25.765 |      6.358 |                     51462 |      12682
encaps                               |     116221 |          3.000 |          25.813 |      0.479 |                     51538 |        526
decaps                               |     119327 |          3.000 |          25.141 |      0.530 |                     50212 |        853

2.2 mlkem-native implementation

Started at 2025-01-24 09:13:17
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
ML-KEM-512                           |            |                |                 |            |                           |           
keygen                               |     247062 |          3.000 |          12.143 |      4.799 |                     24213 |       9580
encaps                               |     181257 |          3.000 |          16.551 |      3.701 |                     33013 |       7367
decaps                               |     154251 |          3.000 |          19.449 |      0.740 |                     38825 |       1103
ML-KEM-768                           |            |                |                 |            |                           |           
keygen                               |     150155 |          3.000 |          19.979 |      5.763 |                     39889 |      11505
encaps                               |     141092 |          3.000 |          21.263 |      0.497 |                     42443 |        503
decaps                               |     112058 |          3.000 |          26.772 |      0.499 |                     53474 |        515
ML-KEM-1024                          |            |                |                 |            |                           |           
keygen                               |     113681 |          3.000 |          26.390 |      6.629 |                     52710 |      13232
encaps                               |     103268 |          3.000 |          29.051 |      0.358 |                     58029 |        532
decaps                               |      81610 |          3.000 |          36.760 |      0.545 |                     73446 |        652

-> The key generation performance is very similar, but there's some performance degradation in encapsulation/decapsulation. This can likely be attributed to the additional key checks implemented in mlkem-native to meet FIPS203 requirements, which are more noticeable in the otherwise optimized code. Feedback from @mkannwischer would be appreciated to confirm if this aligns with your expectations.

@mkannwischer
Copy link

Quick additional question: Could you share a performance comparison run on the same machine "before-after", @bhess? Should help avoid things like #2047. Thanks in advance!

The following measurements are on an Intel Xeon Gold 6338 CPU @ 2.00GHz, Turbo Boost turned off for consistent results:

  1. Generic implementation
Configuration info
==================
Target platform:  x86_64-Linux-6.1.19
Compiler:         gcc (11.4.0)
Compile options:  [-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.12.1-dev (major: 0, minor: 12, patch: 1, pre-release: -dev)
OpenSSL enabled:  Yes (OpenSSL 3.4.0 22 Oct 2024)
AES:              OpenSSL
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_LIBJADE_BUILD OQS_OPT_TARGET=generic CMAKE_BUILD_TYPE=Release 
CPU exts compile-time:  SSE SSE2

1.1. Old implementation from main

Speed test
==========
Started at 2025-01-24 09:09:54
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
ML-KEM-512                           |            |                |                 |            |                           |           
keygen                               |      54132 |          3.000 |          55.420 |     10.625 |                    110759 |      21232
encaps                               |      47809 |          3.000 |          62.751 |      0.584 |                    125417 |        800
decaps                               |      37771 |          3.000 |          79.428 |      0.748 |                    158772 |       1211
ML-KEM-768                           |            |                |                 |            |                           |           
keygen                               |      33523 |          3.000 |          89.491 |     12.123 |                    178901 |      24227
encaps                               |      30855 |          3.000 |          97.232 |      0.715 |                    194379 |       1202
decaps                               |      24982 |          3.000 |         120.088 |      0.801 |                    240104 |       1376
ML-KEM-1024                          |            |                |                 |            |                           |           
keygen                               |      21620 |          3.000 |         138.764 |     14.656 |                    277453 |      29299
encaps                               |      20920 |          3.000 |         143.405 |      0.837 |                    286731 |       1475
decaps                               |      17398 |          3.000 |         172.439 |      0.858 |                    344799 |       1505

1.2 mlkem-native implementation

Started at 2025-01-24 09:11:46
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
ML-KEM-512                           |            |                |                 |            |                           |           
keygen                               |      70652 |          3.000 |          42.462 |      8.883 |                     84854 |      17746
encaps                               |      65996 |          3.000 |          45.458 |      0.582 |                     90836 |        678
decaps                               |      54439 |          3.000 |          55.108 |      0.509 |                    110144 |        808
ML-KEM-768                           |            |                |                 |            |                           |           
keygen                               |      43475 |          3.000 |          69.006 |     10.748 |                    137944 |      21482
encaps                               |      42360 |          3.000 |          70.823 |      0.783 |                    141566 |       1295
decaps                               |      35461 |          3.000 |          84.602 |      0.791 |                    169128 |       1312
ML-KEM-1024                          |            |                |                 |            |                           |           
keygen                               |      28875 |          3.000 |         103.900 |     12.978 |                    207728 |      25950
encaps                               |      28994 |          3.000 |         103.471 |      0.817 |                    206867 |       1409
decaps                               |      24693 |          3.000 |         121.496 |      0.869 |                    242923 |       1537

-> We see a nice speedup in the generic code

  1. Optimized implementation (Intel)
Configuration info
==================
Target platform:  x86_64-Linux-6.1.19
Compiler:         gcc (11.4.0)
Compile options:  [-march=native;-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.12.1-dev (major: 0, minor: 12, patch: 1, pre-release: -dev)
OpenSSL enabled:  Yes (OpenSSL 3.4.0 22 Oct 2024)
AES:              NI
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_LIBJADE_BUILD OQS_OPT_TARGET=auto CMAKE_BUILD_TYPE=Release 
CPU exts compile-time:  ADX AES AVX AVX2 AVX512 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3

2.1 Old implementation from main:

Started at 2025-01-24 09:08:06
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
ML-KEM-512                           |            |                |                 |            |                           |           
keygen                               |     249319 |          3.000 |          12.033 |      4.876 |                     23992 |       9740
encaps                               |     237728 |          3.000 |          12.619 |      0.514 |                     25158 |        340
decaps                               |     262800 |          3.000 |          11.416 |      0.515 |                     22763 |        317
ML-KEM-768                           |            |                |                 |            |                           |           
keygen                               |     154496 |          3.000 |          19.418 |      5.696 |                     38764 |      11357
encaps                               |     156005 |          3.000 |          19.230 |      0.462 |                     38380 |        430
decaps                               |     163952 |          3.000 |          18.298 |      0.504 |                     36528 |        457
ML-KEM-1024                          |            |                |                 |            |                           |           
keygen                               |     116438 |          3.000 |          25.765 |      6.358 |                     51462 |      12682
encaps                               |     116221 |          3.000 |          25.813 |      0.479 |                     51538 |        526
decaps                               |     119327 |          3.000 |          25.141 |      0.530 |                     50212 |        853

2.2 mlkem-native implementation

Started at 2025-01-24 09:13:17
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
ML-KEM-512                           |            |                |                 |            |                           |           
keygen                               |     247062 |          3.000 |          12.143 |      4.799 |                     24213 |       9580
encaps                               |     181257 |          3.000 |          16.551 |      3.701 |                     33013 |       7367
decaps                               |     154251 |          3.000 |          19.449 |      0.740 |                     38825 |       1103
ML-KEM-768                           |            |                |                 |            |                           |           
keygen                               |     150155 |          3.000 |          19.979 |      5.763 |                     39889 |      11505
encaps                               |     141092 |          3.000 |          21.263 |      0.497 |                     42443 |        503
decaps                               |     112058 |          3.000 |          26.772 |      0.499 |                     53474 |        515
ML-KEM-1024                          |            |                |                 |            |                           |           
keygen                               |     113681 |          3.000 |          26.390 |      6.629 |                     52710 |      13232
encaps                               |     103268 |          3.000 |          29.051 |      0.358 |                     58029 |        532
decaps                               |      81610 |          3.000 |          36.760 |      0.545 |                     73446 |        652

-> The key generation performance is very similar, but there's some performance degradation in encapsulation/decapsulation. This can likely be attributed to the additional key checks implemented in mlkem-native to meet FIPS203 requirements, which are more noticeable in the otherwise optimized code. Feedback from @mkannwischer would be appreciated to confirm if this aligns with your expectations.

Thanks for the benchmarks.
No, this is weird. The performance impact of input validation is expected to be around 1% for encaps and maybe 20% for decaps. That doesn't match what you are seeing, so something else must be going on in addition.
I was able to reproduce some of the weirdness you are seeing on a Cascade Lake just now. I will get back to you when I found out what's going on there.

@hanno-becker
Copy link

Apologies for the delay, we did some analysis and experiments in the background to understand the performance numbers reported by @bhess better.

The impact of input validation is surprisingly large: We see up to 5% for encapsulation and 30% for decapsulation. Still, it does not explain all of the performance drop.

mlkem-native is not adopting all AVX2 code from the pqcrystals repo, a conscious decision to limit the verification burden. In light of the above numbers, however, we revisited what merits implementing in intrinsics, and imported AVX2 intrinsics code for polynomial (de)compression routines from pqcrystals (with some robustness improvements).

Below are the current performance numbers of OQS main, mlkem-native main without IV, and mlkem-native main with IV. They are measured on a c6i.metal (Icelake) with Turbo Boost disabled, one of the platforms where we observed particularly large performance deviations compared to pqcrystals.

The performance is now close to the pqcrystals performance when discounting input validation. @bhess Could you also re-run the benchmarks on your machine?

=======================================
======== liboqs `main` ================
=======================================

Configuration info
==================
Target platform:  x86_64-Linux-6.8.0-1021-aws
Compiler:         gcc (13.3.0)
Compile options:  [-march=native;-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.12.1-dev (major: 0, minor: 12, patch: 1, pre-release: -dev)
Git commit:       6a16ac68b59423b4b0068f9faf97cc5162f6d453
OpenSSL enabled:  Yes (OpenSSL 3.0.13 30 Jan 2024)
AES:              NI
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_LIBJADE_BUILD OQS_OPT_TARGET=auto CMAKE_BUILD_TYPE=Release
CPU exts compile-time:  ADX AES AVX AVX2 AVX512 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3


Operation    | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean | pop. stdev
------------ | ----------:| --------------:| ---------------:| ----------:| ----------------:| ----------:
ML-KEM-512   |            |                |                 |            |                  |
keygen       |     368161 |          3.000 |           8.149 |      2.537 |            23565 |       7297
encaps       |     348400 |          3.000 |           8.611 |      0.539 |            24896 |        657
decaps       |     387285 |          3.000 |           7.746 |      0.485 |            22397 |        579
------------ | ----------:| --------------:| ---------------:| ----------:| ----------------:| ----------:
ML-KEM-768   |            |                |                 |            |                  |
keygen       |     225244 |          3.000 |          13.319 |      3.093 |            38559 |       8888
encaps       |     227425 |          3.000 |          13.191 |      0.436 |            38179 |        661
decaps       |     241123 |          3.000 |          12.442 |      0.534 |            36014 |        607
------------ | ----------:| --------------:| ---------------:| ----------:| ----------------:| ----------:
ML-KEM-1024  |            |                |                 |            |                  |
keygen       |     171524 |          3.000 |          17.490 |      3.635 |            50656 |      10459
encaps       |     170472 |          3.000 |          17.598 |      0.587 |            50949 |        934
decaps       |     175943 |          3.000 |          17.051 |      0.349 |            49382 |        852

==================================================================================
======== mlkem-native `test_no_iv`, (= `main` without input validation) ==========
==================================================================================

Configuration info
==================
Target platform:  x86_64-Linux-6.8.0-1021-aws
Compiler:         gcc (13.3.0)
Compile options:  [-march=native;-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.12.1-dev (major: 0, minor: 12, patch: 1, pre-release: -dev)
Git commit:       644a7ceeeb8c254fc56c3f7b81be27e2c0e38551 (+ local modifications)
OpenSSL enabled:  Yes (OpenSSL 3.0.13 30 Jan 2024)
AES:              NI
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_LIBJADE_BUILD OQS_OPT_TARGET=auto CMAKE_BUILD_TYPE=Release
CPU exts compile-time:  ADX AES AVX AVX2 AVX512 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3

Operation   | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean  | pop. stdev
----------- | ----------:| --------------:| ---------------:| ----------:| ------------------| ----------:
ML-KEM-512  |            |                |                 |            |                   |
keygen      |     357424 |          3.000 |           8.393 |      2.580 |             24273 |       7357
encaps      |     327128 |          3.000 |           9.171 |      0.401 |             26524 |        498
decaps      |     358303 |          3.000 |           8.373 |      0.505 |             24215 |        479
----------- | ----------:| --------------:| ---------------:| ----------:| -----------------:| ----------:
ML-KEM-768  |            |                |                 |            |                   |
keygen      |     220019 |          3.000 |          13.635 |      3.194 |             39476 |       9155
encaps      |     217060 |          3.000 |          13.821 |      0.456 |             40004 |        632
decaps      |     225688 |          3.000 |          13.293 |      0.494 |             38482 |        639
----------- | ----------:| --------------:| ---------------:| ----------:| -----------------:| ----------:
ML-KEM-1024 |            |                |                 |            |                   |
keygen      |     167219 |          3.000 |          17.941 |      3.571 |             51958 |      10298
encaps      |     162798 |          3.000 |          18.428 |      0.554 |             53358 |        784
decaps      |     166341 |          3.000 |          18.035 |      0.300 |             52234 |        681

=======================================
======== mlkem-native `main` ==========
=======================================

Configuration info
==================
Target platform:  x86_64-Linux-6.8.0-1021-aws
Compiler:         gcc (13.3.0)
Compile options:  [-march=native;-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.12.1-dev (major: 0, minor: 12, patch: 1, pre-release: -dev)
Git commit:       644a7ceeeb8c254fc56c3f7b81be27e2c0e38551 (+ local modifications)
OpenSSL enabled:  Yes (OpenSSL 3.0.13 30 Jan 2024)
AES:              NI
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_LIBJADE_BUILD OQS_OPT_TARGET=auto CMAKE_BUILD_TYPE=Release
CPU exts compile-time:  ADX AES AVX AVX2 AVX512 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3

Operation    | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean  | pop. stdev
------------ | ----------:| --------------:| ---------------:| ----------:| -----------------:| ----------:
ML-KEM-512   |            |                |                 |            |                   |
keygen       |     355541 |          3.000 |           8.438 |      2.472 |             24358 |       7030
encaps       |     325140 |          3.000 |           9.227 |      0.490 |             26654 |        803
decaps       |     275036 |          3.000 |          10.908 |      0.372 |             31520 |        573
------------ | ----------:| --------------:| ---------------:| ----------:| -----------------:| ----------:
ML-KEM-768   |            |                |                 |            |                   |
keygen       |     218869 |          3.000 |          13.707 |      3.131 |             39682 |       8980
encaps       |     214116 |          3.000 |          14.011 |      0.285 |             40554 |        642
decaps       |     175645 |          3.000 |          17.080 |      0.323 |             49464 |        663
------------ | ----------:| --------------:| ---------------:| ----------:| -----------------:| ----------:
ML-KEM-1024  |            |                |                 |            |                   |
keygen       |     167744 |          3.000 |          17.884 |      3.697 |             51794 |      10653
encaps       |     160844 |          3.000 |          18.652 |      0.554 |             54000 |        787
decaps       |     130757 |          3.000 |          22.943 |      0.405 |             66465 |        788

Signed-off-by: Basil Hess <[email protected]>

Pull update [full tests]

Signed-off-by: Basil Hess <[email protected]>

Update CT files [full tests] [extended tests]

Signed-off-by: Basil Hess <[email protected]>

Update filter_algs [full tests]

Signed-off-by: Basil Hess <[email protected]>

Switch to upstream repo with patches [full tests]

Signed-off-by: Basil Hess <[email protected]>

Update README.md [skip ci]

Signed-off-by: Basil Hess <[email protected]>

New import
Copy-from-upstream option to preserve folder stucture
Smaller patch: no include paths fixing & meta-ymls available upstream
Documenting ct-passes file
Update dependencies for CBOM
[full tests] [extended tests]

Signed-off-by: Basil Hess <[email protected]>

Correct upstream branch [full tests] [extended tests]

Signed-off-by: Basil Hess <[email protected]>

Pull new update with fips202 context initialization [full tests] [extended tests]

Signed-off-by: Basil Hess <[email protected]>

pull from upstream [full tests] [extended tests]

Signed-off-by: Basil Hess <[email protected]>
@bhess
Copy link
Member Author

bhess commented Jan 28, 2025

Thank you @hanno-becker for the updates in mlkem-native and the thorough analysis. I've updated the PR to sync with commit 84398e7230fa31ba4241f5eb36bdc3c1dbbd5bcd from upstream.

The performance numbers on my machine are consistent with those on your Ice Lake instance.

Configuration info
==================
Target platform:  x86_64-Linux-6.1.19
Compiler:         gcc (11.4.0)
Compile options:  [-march=native;-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.12.1-dev (major: 0, minor: 12, patch: 1, pre-release: -dev)
Git commit:       e9f23a9d2fe12ace87d19a4cd13412e8340403bb (+ local modifications)
OpenSSL enabled:  Yes (OpenSSL 3.4.0 22 Oct 2024)
AES:              NI
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_LIBJADE_BUILD OQS_OPT_TARGET=auto CMAKE_BUILD_TYPE=Release 
CPU exts compile-time:  ADX AES AVX AVX2 AVX512 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3

Speed test
==========
Started at 2025-01-28 10:00:31
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
ML-KEM-512                           |            |                |                 |            |                           |           
keygen                               |     241471 |          3.000 |          12.424 |      4.979 |                     24774 |       9923
encaps                               |     222681 |          3.000 |          13.472 |      1.041 |                     26856 |       1831
decaps                               |     189715 |          3.000 |          15.813 |      0.448 |                     31554 |        414
ML-KEM-768                           |            |                |                 |            |                           |           
keygen                               |     150538 |          3.000 |          19.929 |      5.889 |                     39781 |      11756
encaps                               |     146821 |          3.000 |          20.433 |      0.546 |                     40790 |        478
decaps                               |     121462 |          3.000 |          24.699 |      0.528 |                     49327 |        504
ML-KEM-1024                          |            |                |                 |            |                           |           
keygen                               |     114708 |          3.000 |          26.153 |      6.907 |                     52233 |      13803
encaps                               |     110465 |          3.000 |          27.158 |      0.466 |                     54228 |        646
decaps                               |      89447 |          3.000 |          33.540 |      0.598 |                     66999 |        667

Regarding the performance drop due to input validation, I believe this is a tradeoff we should accept in order to align with the ML-KEM/FIPS 203 requirements. If we discount the overhead from input validation, the performance drop compared to pqcrystals is well under 10% for encapsulation/decapsulation, which is within the 15% tolerance defined in the liboqs release process. Also, this seems not to be a regression of the liboqs integration as the same characteristics is visible in standalone mlkem-native. Considering that the new implementation also benefits from formal verification, I think this tradeoff is well worth making. Any thoughts or objections on this?

@hanno-becker
Copy link

Thank you, @bhess! There are CI failures, but they seem unrelated to this PR?

@praveksharma @dstebila @baentsch @SWilson4 Anything else needed, or are we good to go?

@bhess
Copy link
Member Author

bhess commented Jan 29, 2025

There are CI failures, but they seem unrelated to this PR?

I rebased to main. It looks like the CUDA build is also failing on the main branch. @praveksharma, do you have any insights on why that might be?

@praveksharma
Copy link
Member

I rebased to main. It looks like the CUDA build is also failing on the main branch. @praveksharma, do you have any insights on why that might be?

This is likely happening because of a misconfigured CI image becuase of which CMake cannot find the CUDA compiler; here is the relevant issue: #2056. The issue should not effect this PR.

@baentsch
Copy link
Member

As a result, no further patching is required.

Hmm -- when looking at the current PR, I still see a patch file: How does this fit with the above statement @bhess?

@baentsch
Copy link
Member

@praveksharma @dstebila @baentsch @SWilson4 Anything else needed, or are we good to go?

Basically LGTM. Some (I hope only) housekeeping comments I'd suggest addressing before merge, tagging @bhess.

Thanks @hanno-becker @mkannwischer for helping us streamline the OQS "upstream zoo" a bit!

@hanno-becker
Copy link

Thank you, @baentsch, for the careful review -- good calls regarding the patch size and benchmarks!

@bhess
Copy link
Member Author

bhess commented Jan 29, 2025

Thank you for the review @baentsch !

As a result, no further patching is required.

Hmm -- when looking at the current PR, I still see a patch file: How does this fit with the above statement @bhess?

The context of my statement was the following:

The patch size is now much reduced, basically only to adapt a few things to be able to use our fips202/sha3 implementation...

The patch mainly adjusts the include paths from fips202/fips202.h to fips202.h to work with our common implementation. I considered this small change to be manageable as a patch, but you're right that the patch still exists in the PR. All further patches could be removed, though.

@hanno-becker: Once the PR receives a second approval, would it be possible to create a tag or release upstream? This way, we could reference the tag/release instead of pointing to a specific commit in the main branch, as we're doing currently.

@hanno-becker
Copy link

@bhess Yes, that will be fine. Once we have the second approval here, let's sync once more on what upstream commit to update this PR to, and then tag that.

@hanno-becker
Copy link

hanno-becker commented Jan 30, 2025

@bhess Could you update to the latest main of mlkem-native? Once it passes your CI as well, we'll tag it as v1.0.0-alpha2.

@baentsch @praveksharma @dstebila If you're happy, could one of you provide the 2nd approval?

@hanno-becker
Copy link

@bhess We created a tag v1.0.0-alpha.2. Could you update, hopefully a final time?

@bhess
Copy link
Member Author

bhess commented Jan 31, 2025

@bhess We created a tag v1.0.0-alpha.2. Could you update, hopefully a final time?

Thank you @hanno-becker! The PR is updated.

@hanno-becker
Copy link

@praveksharma @baentsch @dstebila Are we good to go?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ML-KEM doesn't perform encapsulation key check
7 participants