-
Notifications
You must be signed in to change notification settings - Fork 51
Optimize EIP-196 AltBn128 EcAdd #301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
1. Memory Pooling for Performance
Introduces sync.Pool objects to reuse allocations and reduce garbage collection pressure:
- bigIntPool - reuses big.Int allocations for scalar operations
- g1Pool / g2Pool - reuses elliptic curve point allocations
- bytes64Pool - reuses 64-byte buffer allocations
2. Simplified Error Handling
- Before: Functions returned error strings passed through buffers between Go and Java
- After: Functions return integer error codes (0 for success, 1-8 for various errors)
- Removes overhead of string allocation and copying across JNI boundary
3. Streamlined JNI Interface
Changes function signatures from:
func eip196altbn128G1Add(input, output, errorBuf *C.char, inputLen C.int,
outputLen, errorLen *C.int) C.int
To:
func eip196altbn128G1Add(input, output *C.char, inputLen C.int) errorCode
4. Optimized Field Validation
- Removes manual field checking (checkInFieldEIP196 function)
- Uses SetBytesCanonical() which performs validation internally
- Eliminates redundant modulus comparisons
5. Direct Encoding
- g1AffineEncode now works directly with point objects using RawBytes()
- Eliminates intermediate Marshal() allocations
6. Reduced Buffer Sizes
- Result buffer size reduced from 128 to 64 bytes (only needs to hold one G1 point)
- Removes 256-byte error buffer entirely
Signed-off-by: Simon Dudley <[email protected]>
Co-authored-by: Ivo Kubjas <[email protected]>
Signed-off-by: Simon Dudley <[email protected]>
The pairing function now writes results (0x01 or 0x00) directly to the output buffer and only returns error codes for actual errors, eliminating the previous hack of using an error code to represent a valid pairing result of zero. Signed-off-by: Simon Dudley <[email protected]>
Signed-off-by: Simon Dudley <[email protected]>
| inputBytes.length, | ||
| output); | ||
|
|
||
| if (errorCode != LibGnarkEIP196.EIP196_ERR_CODE_SUCCESS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we call err_code_success return_code_success?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could, but it stems from this set of related go consts and it's idiomatic to share the same prefix, so would need to change them all to returnCode and most of them are indeed errorCodes https://github.com/hyperledger/besu-native/pull/301/files#diff-9622b17a1165cbfa1780cbc92d116bcbbcb4136daf03dd3d0aa4f9d77373a2ddR35-R41
I'm leaning towards keeping unless you feel strongly to change them all to returnCode...?
ivokub
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also recommend updating gnark-crypto dependency to v0.19.2 (most recent). Most concretely, it contains optimizations for scalar multiplication in case scalars are small.
For a lot of use-cases it can provide significant speedup (Consensys/gnark-crypto#703). It will be less evident due to JNI.
To update:
cd gnark/gnark-jni
go get github.com/consensys/[email protected]
go mod tidyI built and ran unit tests locally and tests pass. I didn't run evmtool.
Otherwise, the changes look good - I think passing directly the pairing return value is better with my previous approach (by passing it through error code).
| assertThat(errorCode).isEqualTo(LibGnarkEIP196.EIP196_ERR_CODE_SUCCESS); | ||
| // The key test: byte 31 should have been written by Go code (either 0x00 or 0x01, not 0xFF) | ||
| assertThat(output[31]).isNotEqualTo((byte) 0xFF); | ||
| assertThat(output[31]).isIn((byte) 0x00, (byte) 0x01); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also check that rest is 0x00?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done 8a312f5 (#301)
garyschulte
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, one safety concern highlighted
| ret = eip196altbn128G1Add(i, output, err, i_len, o_len, err_len); | ||
| ret = eip196altbn128G1Add(i, output, i_len); | ||
| break; | ||
| case EIP196_MUL_OPERATION_RAW_VALUE: | ||
| ret = eip196altbn128G1Mul(i, output, err, i_len, o_len, err_len); | ||
| ret = eip196altbn128G1Mul(i, output, i_len); | ||
| break; | ||
| case EIP196_PAIR_OPERATION_RAW_VALUE: | ||
| ret = eip196altbn128Pairing(i, output, err, i_len, o_len, err_len); | ||
| ret = eip196altbn128Pairing(i, output, i_len); | ||
| // Result is already written to output buffer by Go | ||
| // ret is only non-zero for actual errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removing the string error and error length are 👍 . Some test fixtures in the csv may or may not need to be updated to reflect the new error response.
I want to call out however, that removing the output length makes the raw JNA -> go interface unsafe. We are loading this library with JNA, but using jna direct mapping. Direct mapping means that we don't have a proxy that is doing marshalling and bounds checking.
Removing the output buffer size and checking may be more expedient, but it opens us up to jvm crashes if somebody in the future makes an unsafe change, like sending an undersized or uninitialized output buffer. If it turns out that this optimization is worth the risk, we should add some big scary comments in the jni wrapper and/or explicit output parameters types for each function so we cannot accidentally send an undersized output buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree - right now we're really hoping here that the i and output have correct lengths and are properly initialized. And considering the visibility is public, then in essence anyone can call it.
The main performance problem was with the IntByReference type which was used to send the actual sizes of the arrays through FFI (and it had significant overhead). However, I think in this case it would be sufficient if we would do the length-checks inside your referred code? Then we would fail early before calling into JNA, avoiding segfaults and JVM crash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done b807d30 (#301)
Signed-off-by: Simon Dudley <[email protected]>
Signed-off-by: Simon Dudley <[email protected]>
14b9a14 to
d43a197
Compare
|
Rerun of benchmark with gnark-crypto v0.19.2 bump Seems to give a small improvement to EcAdd (~1 MGas/s) and mul, but not pairings. Not checked if the benching include the small scalars that might benefit the most. |
| if !g1.IsOnCurve() { | ||
| return errCodePointOnCurveCheckFailedEIP196 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw, if we are trying to eke out additional performance, the isOnCurve() checks are duplicated by SetBytesCanonical since gnark-crypto 0.17.0. Doing duplicate checks were kept out of an abundance of caution. see #262 (comment) for context
it is worth at least testing without the duplicate isOnCurve check to determine if the impact is negligible enough to keep for "visibility" reasons within the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that's on my list but would rather keep this as incremental as possible - would save that for another PR.
Signed-off-by: Simon Dudley <[email protected]>
Signed-off-by: Simon Dudley <[email protected]>
Signed-off-by: Simon Dudley <[email protected]>
98f5b5a to
b807d30
Compare
|
New benchmark with the latest code, i.e. the added length check. Maybe a very slight reduction in throughput, probably cancelled out the gnark-crypto upgrade 😁 |
garyschulte
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see safety comment, otherwise LGTM
| if (output.length < EIP196_PREALLOCATE_FOR_RESULT_BYTES) { | ||
| return EIP196_ERR_CODE_INVALID_OUTPUT_LENGTH; | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is safe for eip196_perform_operation, which is our 'contract'. The static native entrypoints are still public and potentially unsafe. IDK if these need to remain public.
I will approve, and leave it to your discretion to make them private or protected and/or add some javadoc to the public static native entrypoints that incorrect output length may lead to a jvm crash.
Changes
Core changes by @ivokub
Introduces sync.Pool objects to reuse allocations and reduce garbage collection pressure:
Changes function signatures from:
func eip196altbn128G1Add(input, output, errorBuf *C.char, inputLen C.int,
outputLen, errorLen *C.int) C.int
To:
func eip196altbn128G1Add(input, output *C.char, inputLen C.int) errorCode
Results
Before this PR
This PR
Benchmark Details
Testing
TODO
More testing, e.g. on mainnet, gas-benchmarks