Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The pfx format has a huge performance loss with clang #5604

Open
claudioandre-br opened this issue Dec 2, 2024 · 1 comment
Open

The pfx format has a huge performance loss with clang #5604

claudioandre-br opened this issue Dec 2, 2024 · 1 comment

Comments

@claudioandre-br
Copy link
Member

claudioandre-br commented Dec 2, 2024

  • I can easily reproduce:

gcc from Ubuntu 24

$ john | head -1; sleep 3; john --test --format=pfx
John the Ripper 1.9.0-jumbo-1+bleeding-b3bd5ea707 2024-12-01 03:06:08 +0100 OMP [linux-gnu 64-bit x86_64 AVX2 AC]
Will run 8 OpenMP threads
Benchmarking: pfx, (.pfx, .p12) [PKCS#12 PBE (SHA1/SHA2) 256/256 AVX2 8x]... (8xOMP) DONE
Speed for cost 1 (iteration count) of 2048, cost 2 (mac-type [1:SHA1 224:SHA224 256:SHA256 384:SHA384 512:SHA512]) of 1
Raw:	26842 c/s real, 3696 c/s virtual

clang from Ubuntu 24.

$ $ run/john | head -1; sleep 3; run/john --test --format=pfx
John the Ripper 1.9.0-jumbo-1+bleeding-364b1ca435 2024-12-02 06:20:23 +0100 [linux-gnu 64-bit x86_64 AVX2 AC]
Benchmarking: pfx, (.pfx, .p12) [PKCS#12 PBE (SHA1/SHA2) 256/256 AVX2 8x2]... DONE
Speed for cost 1 (iteration count) of 2048, cost 2 (mac-type [1:SHA1 224:SHA224 256:SHA256 384:SHA384 512:SHA512]) of 1
Raw:	9792 c/s real, 9792 c/s virtual
$ run/john | head -1; sleep 3; run/john --test --format=pfx
John the Ripper 1.9.0-jumbo-1+bleeding-0251a0f0f8 2024-12-02 18:56:33 -0300 [linux-gnu 64-bit x86_64 AVX2 AC]
Benchmarking: pfx, (.pfx, .p12) [PKCS#12 PBE (SHA1/SHA2) 256/256 AVX2 8x]... DONE
Speed for cost 1 (iteration count) of 2048, cost 2 (mac-type [1:SHA1 224:SHA224 256:SHA256 384:SHA384 512:SHA512]) of 1
Raw:	11328 c/s real, 11328 c/s virtual
@solardiz
Copy link
Member

solardiz commented Dec 2, 2024

We probably need to re-tune these:

#ifndef SIMD_PARA_SHA1
#if defined(__INTEL_COMPILER)
#define SIMD_PARA_SHA1                  1
#elif defined(__clang__)
#define SIMD_PARA_SHA1                  2
#elif defined(__llvm__)
#define SIMD_PARA_SHA1                  2
#elif defined(__XOP__)
#define SIMD_PARA_SHA1                  2
#else
#define SIMD_PARA_SHA1                  1
#endif
#endif

Can you try making it 1 for clang?

Also, not limited to SHA-1, I think we never considered possibly increasing these PARA for AVX-512, where we have twice more SIMD registers (up to 32 now). This is unrelated to this issue, which was seen on AVX2 (so still 16 registers).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants