Skip to content

Slowdown after switch to crypto-bigint #490

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fjarri opened this issue Mar 5, 2025 · 7 comments · Fixed by #506
Closed

Slowdown after switch to crypto-bigint #490

fjarri opened this issue Mar 5, 2025 · 7 comments · Fixed by #506

Comments

@fjarri
Copy link
Contributor

fjarri commented Mar 5, 2025

Benchmarks slowed down several times after #394. What could be the reason?

Performance improvements for boxed uints (RustCrypto/crypto-bigint#777) and the corresponding changes in crypto-primes (entropyxyz/crypto-primes#78) don't really change the results much according to my tests.

sign test time is split ~50/50 between Montgomery exponentiation (with all the time spent in almost_montgomery_mul(), the lowest level function) and BoxedUint::inv_mod(); decrypt is almost exclusively exponentiation. Both of these are the calls in RSA itself; crypto-primes calls take negligible time.

So it seems that either these two functions are somehow much slower than num-bigint (possible, yet unlikely, and straightforward to test), or somehow #394 changed the algorithm to apply them more times than necessary.

@fjarri
Copy link
Contributor Author

fjarri commented Mar 5, 2025

So one difference that I see is that modpow() (and Montgomery multiplication in general) in num-bigint-dig was not, in fact, constant-time. But I would not expect just that account for such a huge difference.

@tarcieri
Copy link
Member

tarcieri commented Mar 5, 2025

Yeah, that's one difference: num_bigint::BigUint has a normalize function which automatically strips leading zeros which is called all over the place.

However, that could also be a clue to the slowdown, namely there may be places where fixed-but-dynamic precision BoxedUints are using a larger size/precision than necessary for a given key size, causing the number of iterations looping through limbs to be much larger than num-bigint (i.e. because they're larger than necessary).

That's my best guess anyway. I'm not sure to sleuth out exactly where that might be happening other than tediously going through line-by-line and comparing the implementations or otherwise looking for unnecessary leading zeros (via e.g. dbg! inspection)

@dignifiedquire
Copy link
Member

The difference comes from the fact that we now do a lot of operation over numbers twice as large as they need to be, at least that is what I remember when I last checked.

if each prime has x bits, we end up doing operations on numbers of the size of 2x bits in a bunch of places, when often times something much smaller would suffice.

@fjarri
Copy link
Contributor Author

fjarri commented Mar 6, 2025

Yep, I just reached that conclusion as well. There are several instances of exponentiation modulo p or q in rsa_decrypt() with the moduli kept in a BoxedUint the size of p * q. That still leaves the inversion, but it's probably the same problem there.

@fjarri
Copy link
Contributor Author

fjarri commented Mar 6, 2025

Btw if anyone wants to deal with this issue, feel free, I don't think I will have enough free time in the coming weeks.

@flatz
Copy link

flatz commented Apr 21, 2025

Hello. Are there any updates on this? I have noticed that it affects my code a lot.

@tarcieri
Copy link
Member

Someone still needs to investigate and find where values are overly wide. That's something I've put on the backburner until we get the rest of the latest releases out

tarcieri pushed a commit that referenced this issue Apr 28, 2025
Fixes #490 

- Bump `crypto-bigint` dependency to the current trunk.
- Use the width of a single prime instead of the full modulus where
possible. Brings `bench_rsa_2048_pkcsv1_decrypt` to pre-`crypto-bigint`
values.
- Add a check to `RsaPrivateKey::from_components()` to ensure
consistency between the primes and the modulus.
- Remove zero padding limbs from the primes and the modulus in
`RsaPrivateKey::from_components()`.
- Add a check to `rsa_decrypt()` to ensure the bit precision of the
ciphertext is the same as that of the modulus.

Notes:
- `bench_rsa_2048_pkcsv1_sign_blinded` can be restored to the original
performance by using variable-time inversion in
`algorithms::rsa::blind()` (as it was during `num-bigint` times), but it
seems to me that the blinding factor must be kept secret, so we have to
use the constant-time inversion. This leads to about 5x slowdown
compared to pre-`crypto-bigint` performance.
- The changes in test vectors are due to
RustCrypto/crypto-bigint#781

### Possible further improvements
- Keep precomputed values as `Odd`/`NonZero` as appropriate.
- If RustCrypto/crypto-bigint#811 is fixed,
resizing integers for the purposes of division can be avoided.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants