Optimize EIP-196 AltBn128 EcAdd #301

siladu · 2025-10-31T07:34:12Z

Changes

Core changes by @ivokub

Memory Pooling for Performance

Introduces sync.Pool objects to reuse allocations and reduce garbage collection pressure:

bigIntPool - reuses big.Int allocations for scalar operations
g1Pool / g2Pool - reuses elliptic curve point allocations
bytes64Pool - reuses 64-byte buffer allocations

Simplified Error Handling

Before: Functions returned error strings passed through buffers between Go and Java
After: Functions return integer error codes (0 for success, 1-8 for various errors)
Removes overhead of string allocation and copying across JNI boundary

Streamlined JNI Interface

Changes function signatures from:
func eip196altbn128G1Add(input, output, errorBuf *C.char, inputLen C.int,
outputLen, errorLen *C.int) C.int
To:
func eip196altbn128G1Add(input, output *C.char, inputLen C.int) errorCode

Optimized Field Validation

Removes manual field checking (checkInFieldEIP196 function)
Uses SetBytesCanonical() which performs validation internally
Eliminates redundant modulus comparisons

Direct Encoding

g1AffineEncode now works directly with point objects using RawBytes()
Eliminates intermediate Marshal() allocations

Reduced Buffer Sizes

Result buffer size reduced from 128 to 64 bytes (only needs to hold one G1 point)
Removes 256-byte error buffer entirely

Results

Before this PR

                               |  Actual cost | Derived Cost |  Iteration time |      Throughput
EcAdd                          |      150 gas |      162 gas |      1,618.3 ns |      98.35 ±1.62 MGps
EcAddMarius                    |      150 gas |      417 gas |      4,173.7 ns |      36.87 ±0.36 MGps
EcAddAmez1                     |      150 gas |      402 gas |      4,019.2 ns |      37.66 ±0.29 MGps
EcAddAmez2                     |      150 gas |      406 gas |      4,055.4 ns |      37.66 ±0.33 MGps
EcAddAmez3                     |      150 gas |      407 gas |      4,074.4 ns |      37.81 ±0.38 MGps
EcAddCase0                     |      150 gas |      419 gas |      4,185.6 ns |      37.71 ±0.37 MGps
EcAddCase1                     |      150 gas |      404 gas |      4,040.7 ns |      37.69 ±0.33 MGps
...
EcAddCase100                   |      150 gas |      158 gas |      1,584.8 ns |     101.14 ±1.51 MGps
EcAddCase106                   |      150 gas |      141 gas |      1,412.5 ns |     110.12 ±1.73 MGps
mul1                           |    6,000 gas |    5,191 gas |     51,909.7 ns |     115.89 ±0.56 MGps
mul2                           |    6,000 gas |    5,148 gas |     51,479.8 ns |     117.04 ±0.67 MGps
2 pairings                     |   79,000 gas |   44,288 gas |    442,882.6 ns |     178.45 ±0.37 MGps
4 pairings                     |  113,000 gas |   62,853 gas |    628,526.7 ns |     179.86 ±0.35 MGps
6 pairings                     |  147,000 gas |   81,424 gas |    814,240.0 ns |     180.58 ±0.28 MGps

This PR

                              |  Actual cost | Derived Cost |  Iteration time |      Throughput
EcAdd                          |      150 gas |       76 gas |        760.4 ns |     212.72 ±2.97 MGps
EcAddMarius                    |      150 gas |      333 gas |      3,327.3 ns |      45.85 ±0.39 MGps
EcAddAmez1                     |      150 gas |      328 gas |      3,275.2 ns |      47.25 ±0.44 MGps
EcAddAmez2                     |      150 gas |      328 gas |      3,277.9 ns |      46.64 ±0.42 MGps
EcAddAmez3                     |      150 gas |      323 gas |      3,225.8 ns |      47.07 ±0.35 MGps
EcAddCase0                     |      150 gas |      322 gas |      3,225.0 ns |      47.51 ±0.36 MGps
EcAddCase1                     |      150 gas |      320 gas |      3,204.4 ns |      47.64 ±0.40 MGps
...
EcAddCase100                   |      150 gas |       71 gas |        709.3 ns |     223.52 ±2.95 MGps
EcAddCase106                   |      150 gas |       62 gas |        617.4 ns |     259.65 ±3.74 MGps
mul1                           |    6,000 gas |    5,235 gas |     52,345.7 ns |     115.08 ±0.65 MGps
mul2                           |    6,000 gas |    5,178 gas |     51,777.6 ns |     116.35 ±0.65 MGps
2 pairings                     |   79,000 gas |   44,050 gas |    440,497.9 ns |     179.42 ±0.37 MGps
4 pairings                     |  113,000 gas |   62,573 gas |    625,732.9 ns |     180.65 ±0.35 MGps
6 pairings                     |  147,000 gas |   81,163 gas |    811,627.4 ns |     181.19 ±0.34 MGps

Benchmark Details

besu-ecadd-warm-exec-invert$ time ./build/install/besu/bin/evmtool benchmark --native --warm-iterations=20000 --exec-iterations=1000 --warm-invert=true altBn128
besu/v25.7-develop-a88105d/linux-x86_64/openjdk-java-21

****************************** Hardware Specs ******************************
* VM Type: m6a.2xlarge
* OS: GNU/Linux Ubuntu 24.04.2 LTS (Noble Numbat) build 6.14.0-1009-aws
* Processor: AMD EPYC 7R13 Processor
* Microarchitecture: Zen 3
* Physical CPU packages: 1
* Physical CPU cores: 4
* Logical CPU cores: 8
* Average Max Frequency per core: 4501 MHz
* Memory Total: 32 GB

Testing

passes besu reference tests
mainnet sync test
gas-benchmarks
Fuzzing?

TODO

More testing, e.g. on mainnet, gas-benchmarks

1. Memory Pooling for Performance Introduces sync.Pool objects to reuse allocations and reduce garbage collection pressure: - bigIntPool - reuses big.Int allocations for scalar operations - g1Pool / g2Pool - reuses elliptic curve point allocations - bytes64Pool - reuses 64-byte buffer allocations 2. Simplified Error Handling - Before: Functions returned error strings passed through buffers between Go and Java - After: Functions return integer error codes (0 for success, 1-8 for various errors) - Removes overhead of string allocation and copying across JNI boundary 3. Streamlined JNI Interface Changes function signatures from: func eip196altbn128G1Add(input, output, errorBuf *C.char, inputLen C.int, outputLen, errorLen *C.int) C.int To: func eip196altbn128G1Add(input, output *C.char, inputLen C.int) errorCode 4. Optimized Field Validation - Removes manual field checking (checkInFieldEIP196 function) - Uses SetBytesCanonical() which performs validation internally - Eliminates redundant modulus comparisons 5. Direct Encoding - g1AffineEncode now works directly with point objects using RawBytes() - Eliminates intermediate Marshal() allocations 6. Reduced Buffer Sizes - Result buffer size reduced from 128 to 64 bytes (only needs to hold one G1 point) - Removes 256-byte error buffer entirely Signed-off-by: Simon Dudley <[email protected]> Co-authored-by: Ivo Kubjas <[email protected]>

Signed-off-by: Simon Dudley <[email protected]>

The pairing function now writes results (0x01 or 0x00) directly to the output buffer and only returns error codes for actual errors, eliminating the previous hack of using an error code to represent a valid pairing result of zero. Signed-off-by: Simon Dudley <[email protected]>

Signed-off-by: Simon Dudley <[email protected]>

macfarla · 2025-11-06T03:18:28Z

gnark/src/test/java/org/hyperledger/besu/nativelib/gnark/LibGnarkEIP196ConcurrentTest.java

+              inputBytes.length,
+              output);
+
+          if (errorCode != LibGnarkEIP196.EIP196_ERR_CODE_SUCCESS) {


can we call err_code_success return_code_success?

we could, but it stems from this set of related go consts and it's idiomatic to share the same prefix, so would need to change them all to returnCode and most of them are indeed errorCodes https://github.com/hyperledger/besu-native/pull/301/files#diff-9622b17a1165cbfa1780cbc92d116bcbbcb4136daf03dd3d0aa4f9d77373a2ddR35-R41
I'm leaning towards keeping unless you feel strongly to change them all to returnCode...?

ivokub

I also recommend updating gnark-crypto dependency to v0.19.2 (most recent). Most concretely, it contains optimizations for scalar multiplication in case scalars are small.

For a lot of use-cases it can provide significant speedup (Consensys/gnark-crypto#703). It will be less evident due to JNI.

To update:

cd gnark/gnark-jni
go get github.com/consensys/[email protected]
go mod tidy

I built and ran unit tests locally and tests pass. I didn't run evmtool.

Otherwise, the changes look good - I think passing directly the pairing return value is better with my previous approach (by passing it through error code).

ivokub · 2025-11-07T13:05:45Z

gnark/src/test/java/org/hyperledger/besu/nativelib/gnark/LibGnarkEIP196EdgeCaseTest.java

+    assertThat(errorCode).isEqualTo(LibGnarkEIP196.EIP196_ERR_CODE_SUCCESS);
+    // The key test: byte 31 should have been written by Go code (either 0x00 or 0x01, not 0xFF)
+    assertThat(output[31]).isNotEqualTo((byte) 0xFF);
+    assertThat(output[31]).isIn((byte) 0x00, (byte) 0x01);


Should we also check that rest is 0x00?

Done 8a312f5 (#301)

garyschulte

LGTM, one safety concern highlighted

garyschulte · 2025-11-12T17:57:10Z

gnark/src/main/java/org/hyperledger/besu/nativelib/gnark/LibGnarkEIP196.java

-        ret = eip196altbn128G1Add(i, output, err, i_len, o_len, err_len);
+        ret = eip196altbn128G1Add(i, output, i_len);
        break;
      case  EIP196_MUL_OPERATION_RAW_VALUE:
-        ret = eip196altbn128G1Mul(i, output, err, i_len, o_len, err_len);
+        ret = eip196altbn128G1Mul(i, output, i_len);
        break;
      case EIP196_PAIR_OPERATION_RAW_VALUE:
-        ret = eip196altbn128Pairing(i, output, err, i_len, o_len, err_len);
+        ret = eip196altbn128Pairing(i, output, i_len);
+        // Result is already written to output buffer by Go
+        // ret is only non-zero for actual errors


removing the string error and error length are 👍 . Some test fixtures in the csv may or may not need to be updated to reflect the new error response.

I want to call out however, that removing the output length makes the raw JNA -> go interface unsafe. We are loading this library with JNA, but using jna direct mapping. Direct mapping means that we don't have a proxy that is doing marshalling and bounds checking.

Removing the output buffer size and checking may be more expedient, but it opens us up to jvm crashes if somebody in the future makes an unsafe change, like sending an undersized or uninitialized output buffer. If it turns out that this optimization is worth the risk, we should add some big scary comments in the jni wrapper and/or explicit output parameters types for each function so we cannot accidentally send an undersized output buffer.

I agree - right now we're really hoping here that the i and output have correct lengths and are properly initialized. And considering the visibility is public, then in essence anyone can call it.

The main performance problem was with the IntByReference type which was used to send the actual sizes of the arrays through FFI (and it had significant overhead). However, I think in this case it would be sufficient if we would do the length-checks inside your referred code? Then we would fail early before calling into JNA, avoiding segfaults and JVM crash.

done b807d30 (#301)

Signed-off-by: Simon Dudley <[email protected]>

siladu · 2025-11-13T14:22:41Z

Rerun of benchmark with gnark-crypto v0.19.2 bump

                               |  Actual cost | Derived Cost |  Iteration time |      Throughput
EcAdd                          |      150 gas |       72 gas |        717.7 ns |     215.21 ±1.54 MGps
EcAddMarius                    |      150 gas |      324 gas |      3,240.1 ns |      46.92 ±0.30 MGps
EcAddAmez1                     |      150 gas |      319 gas |      3,193.3 ns |      48.09 ±0.31 MGps
EcAddAmez2                     |      150 gas |      313 gas |      3,131.4 ns |      48.17 ±0.21 MGps
EcAddAmez3                     |      150 gas |      315 gas |      3,152.9 ns |      48.09 ±0.26 MGps
EcAddCase0                     |      150 gas |      312 gas |      3,124.0 ns |      48.42 ±0.25 MGps
EcAddCase1                     |      150 gas |      313 gas |      3,127.4 ns |      48.45 ±0.26 MGps
...
EcAddCase100                   |      150 gas |       66 gas |        661.2 ns |     227.69 ±1.20 MGps
EcAddCase106                   |      150 gas |       59 gas |        590.4 ns |     265.09 ±2.25 MGps
mul1                           |    6,000 gas |    4,759 gas |     47,586.2 ns |     126.56 ±0.70 MGps
mul2                           |    6,000 gas |    4,712 gas |     47,121.1 ns |     128.01 ±0.81 MGps
2 pairings                     |   79,000 gas |   43,924 gas |    439,235.3 ns |     179.93 ±0.35 MGps
4 pairings                     |  113,000 gas |   62,278 gas |    622,780.7 ns |     181.51 ±0.33 MGps
6 pairings                     |  147,000 gas |   80,592 gas |    805,917.8 ns |     182.44 ±0.29 MGps

Seems to give a small improvement to EcAdd (~1 MGas/s) and mul, but not pairings. Not checked if the benching include the small scalars that might benefit the most.

garyschulte · 2025-11-13T14:50:23Z

gnark/gnark-jni/gnark-eip-196.go

+	if !g1.IsOnCurve() {
+		return errCodePointOnCurveCheckFailedEIP196


fwiw, if we are trying to eke out additional performance, the isOnCurve() checks are duplicated by SetBytesCanonical since gnark-crypto 0.17.0. Doing duplicate checks were kept out of an abundance of caution. see #262 (comment) for context

it is worth at least testing without the duplicate isOnCurve check to determine if the impact is negligible enough to keep for "visibility" reasons within the code.

Yep, that's on my list but would rather keep this as incremental as possible - would save that for another PR.

Signed-off-by: Simon Dudley <[email protected]>

siladu · 2025-11-25T17:33:12Z

New benchmark with the latest code, i.e. the added length check. Maybe a very slight reduction in throughput, probably cancelled out the gnark-crypto upgrade 😁

                               |  Actual cost | Derived Cost |  Iteration time |      Throughput
EcAdd                          |      150 gas |       80 gas |        803.3 ns |     205.76 ±3.36 MGps
EcAddMarius                    |      150 gas |      334 gas |      3,340.0 ns |      45.95 ±0.46 MGps
EcAddAmez1                     |      150 gas |      328 gas |      3,280.6 ns |      47.02 ±0.50 MGps
EcAddAmez2                     |      150 gas |      327 gas |      3,273.8 ns |      46.88 ±0.48 MGps
EcAddAmez3                     |      150 gas |      337 gas |      3,370.1 ns |      46.84 ±0.56 MGps
EcAddCase0                     |      150 gas |      325 gas |      3,249.5 ns |      47.81 ±0.49 MGps
EcAddCase1                     |      150 gas |      322 gas |      3,220.7 ns |      47.77 ±0.45 MGps
...
EcAddCase100                   |      150 gas |       73 gas |        725.5 ns |     220.15 ±3.28 MGps
EcAddCase106                   |      150 gas |       64 gas |        642.8 ns |     251.34 ±3.94 MGps
mul1                           |    6,000 gas |    4,771 gas |     47,708.4 ns |     126.34 ±0.76 MGps
mul2                           |    6,000 gas |    4,690 gas |     46,902.1 ns |     128.48 ±0.72 MGps
2 pairings                     |   79,000 gas |   43,882 gas |    438,815.8 ns |     180.09 ±0.34 MGps
4 pairings                     |  113,000 gas |   62,162 gas |    621,623.3 ns |     181.85 ±0.33 MGps
6 pairings                     |  147,000 gas |   80,518 gas |    805,178.5 ns |     182.61 ±0.27 MGps

garyschulte

see safety comment, otherwise LGTM

garyschulte · 2025-11-25T17:56:53Z

gnark/src/main/java/org/hyperledger/besu/nativelib/gnark/LibGnarkEIP196.java

+    if (output.length < EIP196_PREALLOCATE_FOR_RESULT_BYTES) {
+      return EIP196_ERR_CODE_INVALID_OUTPUT_LENGTH;
+    }
+


This is safe for eip196_perform_operation, which is our 'contract'. The static native entrypoints are still public and potentially unsafe. IDK if these need to remain public.

I will approve, and leave it to your discretion to make them private or protected and/or add some javadoc to the public static native entrypoints that incorrect output length may lead to a jvm crash.

siladu and others added 4 commits October 31, 2025 16:40

Fix tests

545200f

Signed-off-by: Simon Dudley <[email protected]>

More unit tests

5845f82

Signed-off-by: Simon Dudley <[email protected]>

macfarla mentioned this pull request Nov 4, 2025

EcAdd precompile hyperledger/besu#8856

Open

6 tasks

macfarla reviewed Nov 6, 2025

View reviewed changes

ivokub reviewed Nov 7, 2025

View reviewed changes

garyschulte reviewed Nov 12, 2025

View reviewed changes

siladu added 3 commits November 13, 2025 12:23

Review feedback - check other bytes aren't overwritten

8a312f5

Signed-off-by: Simon Dudley <[email protected]>

Merge remote-tracking branch 'upstream/main' into improve-ecadd-perf

20e4087

Bump gnark-crypto to v0.19.2

d43a197

Signed-off-by: Simon Dudley <[email protected]>

siladu force-pushed the improve-ecadd-perf branch from 14b9a14 to d43a197 Compare November 13, 2025 13:42

garyschulte reviewed Nov 13, 2025

View reviewed changes

siladu added 3 commits November 25, 2025 09:27

Clarifying comments

f42db92

Signed-off-by: Simon Dudley <[email protected]>

Remove unused err code and ensure the constants are used instead of ints

01f8fbb

Signed-off-by: Simon Dudley <[email protected]>

Check output length before calling into JNA

b807d30

Signed-off-by: Simon Dudley <[email protected]>

siladu force-pushed the improve-ecadd-perf branch from 98f5b5a to b807d30 Compare November 25, 2025 11:12

garyschulte approved these changes Nov 25, 2025

View reviewed changes

		if !g1.IsOnCurve() {
		return errCodePointOnCurveCheckFailedEIP196

Optimize EIP-196 AltBn128 EcAdd #301

Are you sure you want to change the base?

Optimize EIP-196 AltBn128 EcAdd #301

Uh oh!

Conversation

siladu commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Results

Before this PR

This PR

Benchmark Details

Testing

TODO

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ivokub left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

garyschulte left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

siladu commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

garyschulte Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

siladu commented Nov 25, 2025

Uh oh!

garyschulte left a comment

Choose a reason for hiding this comment

Uh oh!

garyschulte Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

siladu commented Oct 31, 2025 •

edited

Loading

siladu commented Nov 13, 2025 •

edited

Loading

garyschulte Nov 13, 2025 •

edited

Loading

garyschulte Nov 25, 2025 •

edited

Loading