Skip to content

Conversation

@Raimo33
Copy link

@Raimo33 Raimo33 commented Sep 19, 2025

This PR addresses issue #1751 by adding a call to check_arm32_assembly() by default, matching the current behavior with check_x86_64_assembly().

This would result in speedup on field_10x26_impl.h on default builds. For example, currently, the Bitcoin Core reference implementation compiles libsecp256k1 with default options, leading to unoptimal builds.

This change could help address bitcoin/bitcoin#32832 partially, considering the flamegraph shows that ecdsa_verify takes 90% of IBD time.

Benchmark Avg(us) OFF Avg(us) arm32 Improvement (%)
ecdsa_verify 379.0 322.0 15.0
ecdsa_sign 184.0 170.0 7.6
ec_keygen 160.0 145.0 9.4
ecdh 382.0 332.0 13.1
schnorrsig_sign 162.0 148.0 8.6
schnorrsig_verify 380.0 323.0 15.0
ellswift_encode 109.0 95.1 12.7
ellswift_decode 60.2 50.8 15.6
ellswift_keygen 268.0 240.0 10.4
ellswift_ecdh 395.0 343.0 13.2

@Raimo33 Raimo33 force-pushed the detect-arm32-asm branch 2 times, most recently from 5de771e to 2b65c2d Compare September 19, 2025 17:57
@hebasto
Copy link
Member

hebasto commented Sep 19, 2025

This would result in speedup on field_10x26_impl.h on default builds.

Please provide benchmarks to support this statement.

@Raimo33 Raimo33 force-pushed the detect-arm32-asm branch 2 times, most recently from d987270 to 7a613ce Compare September 20, 2025 13:14
@Raimo33
Copy link
Author

Raimo33 commented Sep 22, 2025

Please provide benchmarks to support this statement.

will do. I'm buying a raspberry PI right now. I reckon if the benchmarks don't show improvements we should delete field_10x26_arm.s entirely

@hebasto
Copy link
Member

hebasto commented Sep 25, 2025

Perhaps convert this to a draft while the CI is red?

@Raimo33 Raimo33 marked this pull request as draft September 25, 2025 13:16
@Raimo33
Copy link
Author

Raimo33 commented Oct 28, 2025

Please provide benchmarks to support this statement.

I've ran benchmarks on my raspberry pi 4. Here are the results.

Optional features:
assembly ............................ OFF

Benchmark                     ,    Min(us)    ,    Avg(us)    ,    Max(us)    

ecdsa_verify                  ,   378.0       ,   379.0       ,   379.0    
ecdsa_sign                    ,   184.0       ,   184.0       ,   185.0    
ec_keygen                     ,   160.0       ,   160.0       ,   160.0    
ecdh                          ,   382.0       ,   382.0       ,   383.0    
schnorrsig_sign               ,   162.0       ,   162.0       ,   162.0    
schnorrsig_verify             ,   380.0       ,   380.0       ,   381.0    
ellswift_encode               ,   109.0       ,   109.0       ,   109.0    
ellswift_decode               ,    60.1       ,    60.2       ,    60.3    
ellswift_keygen               ,   268.0       ,   268.0       ,   268.0    
ellswift_ecdh                 ,   395.0       ,   395.0       ,   395.0

Optional features:
assembly ............................ arm32

Benchmark                     ,    Min(us)    ,    Avg(us)    ,    Max(us)    

ecdsa_verify                  ,   322.0       ,   322.0       ,   322.0    
ecdsa_sign                    ,   170.0       ,   170.0       ,   170.0    
ec_keygen                     ,   145.0       ,   145.0       ,   145.0    
ecdh                          ,   332.0       ,   332.0       ,   333.0    
schnorrsig_sign               ,   148.0       ,   148.0       ,   149.0    
schnorrsig_verify             ,   323.0       ,   323.0       ,   324.0    
ellswift_encode               ,    94.9       ,    95.1       ,    95.3    
ellswift_decode               ,    50.6       ,    50.8       ,    50.9    
ellswift_keygen               ,   239.0       ,   240.0       ,   240.0    
ellswift_ecdh                 ,   343.0       ,   343.0       ,   343.0

@hebasto
Copy link
Member

hebasto commented Oct 28, 2025

I've ran benchmarks on my raspberry pi 4. Here are the results.

Which compiler did you use?

@Raimo33
Copy link
Author

Raimo33 commented Oct 28, 2025

Which compiler did you use?

I used the option -DCMAKE_TOOLCHAIN_FILE=./cmake/arm-linux-gnueabihf.toolchain.cmake

C compiler ............................ GNU 13.3.0, /usr/bin/arm-linux-gnueabihf-gcc

@hebasto
Copy link
Member

hebasto commented Oct 28, 2025

Which compiler did you use?

I used the option -DCMAKE_TOOLCHAIN_FILE=./cmake/arm-linux-gnueabihf.toolchain.cmake

C compiler ............................ GNU 13.3.0, /usr/bin/arm-linux-gnueabihf-gcc

This workflow is for cross-compiling. Did you try to build natively on your RPi?

@Raimo33
Copy link
Author

Raimo33 commented Oct 28, 2025

Did you try to build natively on your RPi?

I get:

/home/pi/secp256k1/src/asm/field_10x26_arm.s:875: Error: selected processor does not support `ubfx r2,r3,#0,#22' in ARM mode
/home/pi/secp256k1/src/asm/field_10x26_arm.s:880: Error: selected processor does not support `movw r14,field_R1<<4' in ARM mode

when building natively on my RPI. the compiler is /usr/libexec/gcc/arm-linux-gnueabihf/14/
Apparently that's because RPIs build for armv6 architectures, not armv7...

and our field_10x26_arm.s is only compatible with armv7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants