Skip to content

fix: fail pk and plaintext ag err path [skip-line-limit]#1537

Open
hmzakhalid wants to merge 1 commit intomainfrom
fix/e3-halt-hlt-006-hlt-007
Open

fix: fail pk and plaintext ag err path [skip-line-limit]#1537
hmzakhalid wants to merge 1 commit intomainfrom
fix/e3-halt-hlt-006-hlt-007

Conversation

@hmzakhalid
Copy link
Copy Markdown
Member

@hmzakhalid hmzakhalid commented May 6, 2026

close #1536

Summary by CodeRabbit

Release Notes

  • New Features

    • Added E3Failed event for explicit terminal failure signaling during compute and verification steps.
    • Introduced DecryptionshareCreated event publishing per-party decryption shares with cryptographic proofs.
    • Enhanced aggregation events (PublicKeyAggregated, PlaintextAggregated, ThresholdShareCreated) with richer metadata including proofs and node information.
  • Bug Fixes

    • Improved error handling and validation throughout DKG and decryption workflows with proper failure routing.
  • Documentation

    • Updated Enclave Protocol flow trace documentation with current protocol design concerns and implementation updates.

@vercel
Copy link
Copy Markdown

vercel Bot commented May 6, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
crisp Ready Ready Preview, Comment May 6, 2026 11:31am
enclave-docs Ready Ready Preview, Comment May 6, 2026 11:31am

Request Review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 6, 2026

📝 Walkthrough

Walkthrough

This PR implements explicit failure signaling for public key and plaintext aggregation paths. It updates flow trace documentation, adds correlation-tracked error handling with E3Failed emission to PublicKeyAggregator and ThresholdPlaintextAggregator, introduces a decryption-round failure helper, and expands test coverage for error scenarios.

Changes

Aggregation Explicit Failure Paths

Layer / File(s) Summary
Protocol Documentation
agent/flow-trace/00_INDEX.md, agent/flow-trace/04_DKG_AND_COMPUTATION.md
Updated Verified Bugs table rows for publishCommittee and CommitteePublished payload. Expanded Protocol Design Concerns with entries for committee decentralization, gracePeriod dead code, CLI mislabeling, load-balancing, expulsion, and new failure-bridge fixes. Added DKG timeout derivation, correlation tracking for multi-stage compute requests, E3Failed terminal failures, and new decryption-aggregation paths with richer metadata.
Event Types & Imports
crates/aggregator/src/publickey_aggregator.rs (lines 8–18), crates/aggregator/src/threshold_plaintext_aggregator.rs (lines 11–21)
Extended e3_events imports to include ComputeRequestError, ComputeRequest, Die, DKGRecursiveAggregationComplete, and related types needed by new error handlers and correlation tracking.
PublicKeyAggregator Error Handling
crates/aggregator/src/publickey_aggregator.rs
Added handle_compute_request_error method to process ComputeRequestError, emit E3Failed, and reset state; implemented Handler<TypedEvent<ComputeRequestError>> and Handler<Die>; converted mixed DKG node proof bail into explicit error logging and state normalization; extended event routing in EnclaveEventData.
ThresholdPlaintextAggregator Correlation & Failure Tracking
crates/aggregator/src/threshold_plaintext_aggregator.rs
Added threshold_decryption_correlation field for in-flight request tracking; introduced fail_decryption_round helper to publish E3Failed and clear state; added rigorous validations after C6 verification (insufficient honest shares), d_commitment checks, and C7 proof count; implemented Handler<ComputeRequestError> and correlation-aware ComputeResponse handling with early returns on mismatches; extended event routing.
Dev Dependencies
crates/aggregator/Cargo.toml
Added e3-test-helpers dev-dependency to support test infrastructure.
Tests & Validation
crates/aggregator/src/publickey_aggregator.rs (lines 1267–1441), crates/aggregator/src/threshold_plaintext_aggregator.rs (lines 1127–1405)
Expanded test coverage with scenarios for ComputeRequestError handling, insufficient honest shares, decryption aggregation flow, and failure paths; added test helpers and state constructors to validate new error and recovery behaviors.

Sequence Diagram

sequenceDiagram
    participant Agg as PublicKeyAggregator/<br/>ThresholdPlaintextAggregator
    participant Worker as ComputeWorker
    participant Event as E3 Event System

    Agg->>Agg: Create ComputeRequest<br/>with CorrelationId
    Agg->>Worker: Dispatch DkgAggregation<br/>or CalculateDecryption
    Note over Worker: Compute step fails
    Worker-->>Agg: ComputeRequestError
    
    rect rgba(255, 100, 100, 0.5)
        Agg->>Agg: handle_compute_request_error:<br/>Match correlation, verify async context
        Agg->>Agg: Clear in-flight state<br/>(correlation, proofs, pending flags)
        Agg->>Event: Emit E3Failed<br/>with failure reason
    end
    
    Event-->>Agg: Event published
    Agg->>Agg: Round halted explicitly<br/>with clear failure signal
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly Related PRs

  • gnosisguild/enclave#1500: Modifies threshold plaintext aggregation with per-share C6 commitment verification and correlation handling for decrypted shares.
  • gnosisguild/enclave#1470: Adds correlation checks and gating for fold-proof handling in threshold plaintext aggregation.
  • gnosisguild/enclave#142: Modifies PublicKeyAggregator actor shutdown by introducing Handler<Die> pattern for destruction.

Suggested Reviewers

  • cedoor
  • ctrlc03
  • ryardley

🐰 A failure path so clear and bright,
No more stalling in the night!
E3Failed rings out its bell,
When aggregation goes through hell.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 23.81% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed The code changes directly address both failure modes in issue #1536: PublicKeyAggregator handles ComputeRequestError with E3Failed emission [#1536], and ThresholdPlaintextAggregator implements explicit failure routing with new fail_decryption_round helper [#1536].
Out of Scope Changes check ✅ Passed All changes are scoped to addressing issue #1536: aggregator documentation updates, error handling in PK/plaintext aggregators, correlation tracking, and supporting test infrastructure are all directly related to the terminal failure objective.
Title check ✅ Passed The title mentions 'fail pk and plaintext ag err path' which directly relates to the PR's main objective of making public key and plaintext aggregation failures explicitly terminal. However, the title uses abbreviated jargon ('ag' for aggregation, 'err' for error) and is incomplete compared to the full scope described in objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/e3-halt-hlt-006-hlt-007

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@hmzakhalid hmzakhalid changed the title fix(aggregator): fail public key and plaintext aggregation terminal [skip-line-limit] fix(aggregator): fail public key and plaintext ag terminal path [skip-line-limit] May 6, 2026
@hmzakhalid hmzakhalid changed the title fix(aggregator): fail public key and plaintext ag terminal path [skip-line-limit] fix(ag): fail pk and plaintext ag terminal path [skip-line-limit] May 6, 2026
@hmzakhalid hmzakhalid changed the title fix(ag): fail pk and plaintext ag terminal path [skip-line-limit] fix: fail pk and plaintext ag err path [skip-line-limit] May 6, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@agent/flow-trace/04_DKG_AND_COMPUTATION.md`:
- Around line 47-52: The doc claims computation-stage failures from
ThresholdKeyshare (GenPkShareAndSkSss, GenEsiSss, CalculateDecryptionKey,
CalculateDecryptionShare) emit E3Failed, but the code currently propagates
errors via the ? operator into trap() producing EnclaveError and
ThresholdKeyshare also doesn't subscribe to ComputeRequestError; fix by wiring
these computation-response handlers to explicitly emit E3Failed events instead
of using ? into trap() and ensure ThresholdKeyshare subscribes to
ComputeRequestError; update the four handler implementations (functions handling
GenPkShareAndSkSss, GenEsiSss, CalculateDecryptionKey, CalculateDecryptionShare)
to convert errors into the E3Failed emission path, and add subscription for
ComputeRequestError in the ThresholdKeyshare setup so computation failures
trigger the E3Failed path rather than EnclaveError.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0f300a69-6c5f-4e64-960b-0ce6f2acb056

📥 Commits

Reviewing files that changed from the base of the PR and between c7e9802 and 1e5d70d.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • agent/flow-trace/00_INDEX.md
  • agent/flow-trace/04_DKG_AND_COMPUTATION.md
  • crates/aggregator/Cargo.toml
  • crates/aggregator/src/publickey_aggregator.rs
  • crates/aggregator/src/threshold_plaintext_aggregator.rs

Comment on lines +47 to +52
│ └─ Collector timeouts are derived from the DKG stage budget:
│ ├─ shared base window from `E3_DKG_WINDOW_SECS` (default 7200s,
│ │ matching current production `Enclave` deployment config)
│ ├─ EncryptionKeyCollector cutoff at 10% of the DKG window
│ ├─ ThresholdShareCollector cutoff at 60% of the DKG window
│ └─ per-collector env vars still override these derived defaults
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm ThresholdKeyshare actually emits E3Failed for the stages described in 04_DKG_AND_COMPUTATION.md.
fd -t f 'threshold_keyshare.rs|threshold_keyshare/' -X rg -nP -C3 'E3Failed|EncryptionKeyCollectionFailed|ThresholdShareCollectionFailed|CalculateDecryptionShare|CalculateDecryptionKey|GenEsiSss|GenPkShareAndSkSss' {} \;
rg -nP --type=rust -C2 'E3Failed\s*\{' -g '!crates/aggregator/**'

Repository: gnosisguild/enclave

Length of output: 19851


🏁 Script executed:

# Look at the full implementation of computation response handlers in threshold_keyshare.rs
fd -type f -name 'threshold_keyshare.rs' -x sh -c '
  echo "=== handle_gen_pk_share_and_sk_sss_response ==="
  sed -n "914,960p" "$1"
  echo ""
  echo "=== handle_gen_esi_sss_response ==="
  sed -n "997,1010p" "$1"
  echo ""
  echo "=== handle_calculate_decryption_key_response ==="
  sed -n "1837,1900p" "$1"
  echo ""
  echo "=== handle_calculate_decryption_share_response ==="
  sed -n "2136,2180p" "$1"
' sh {} \;

Repository: gnosisguild/enclave

Length of output: 234


🏁 Script executed:

# Find and read the threshold_keyshare.rs file
fd 'threshold_keyshare.rs' | head -1 | xargs -I {} sh -c '
  echo "=== handle_gen_pk_share_and_sk_sss_response ==="
  sed -n "914,960p" "$1"
  echo ""
  echo "=== handle_gen_esi_sss_response ==="
  sed -n "997,1010p" "$1"
  echo ""
  echo "=== handle_calculate_decryption_key_response ==="
  sed -n "1837,1900p" "$1"
  echo ""
  echo "=== handle_calculate_decryption_share_response ==="
  sed -n "2136,2180p" "$1"
' sh {}

Repository: gnosisguild/enclave

Length of output: 7158


🏁 Script executed:

# Check where handle_compute_response is called and how errors are handled
rg -n 'handle_compute_response|pub fn handle_compute_response' crates/keyshare/src/threshold_keyshare.rs -B 2 -A 10 | head -100

Repository: gnosisguild/enclave

Length of output: 1032


🏁 Script executed:

# Find the trap function definition
rg -n 'fn trap|pub fn trap' -A 15 --type rust | head -60

Repository: gnosisguild/enclave

Length of output: 2231


🏁 Script executed:

# Find the ErrorDispatcher trait and its err method implementation
rg -n 'trait ErrorDispatcher|fn err|pub fn err' -A 10 crates/events/src/enclave_event/enclave_error.rs | head -80

Repository: gnosisguild/enclave

Length of output: 904


🏁 Script executed:

# Find BusHandle and its err method
rg -n 'impl.*ErrorDispatcher.*BusHandle|fn err' crates/events/src/ -A 10 | grep -A 15 'BusHandle'

Repository: gnosisguild/enclave

Length of output: 3737


🏁 Script executed:

# Find the event_from_error method
rg -n 'fn event_from_error|event_from_error' crates/events/src/bus_handle.rs -A 20 | head -80

Repository: gnosisguild/enclave

Length of output: 1934


🏁 Script executed:

# Find the EnclaveEvent::from_error implementation
rg -n 'fn from_error' crates/events/src/ -A 20 | head -100

Repository: gnosisguild/enclave

Length of output: 6451


🏁 Script executed:

# Check if there are any error handlers that convert EnclaveError to E3Failed
rg -n 'EnclaveError|EnclaveEventData::EnclaveError' crates/keyshare/src/threshold_keyshare.rs -B 3 -A 5

Repository: gnosisguild/enclave

Length of output: 45


🏁 Script executed:

# Check how EnclaveError events are handled in threshold_keyshare.rs
rg -n 'EnclaveEventData::' crates/keyshare/src/threshold_keyshare.rs | head -20

Repository: gnosisguild/enclave

Length of output: 1082


🏁 Script executed:

# Check if EnclaveError events are handled anywhere that might convert them to E3Failed
rg -n 'EnclaveError|Handler.*EnclaveError' crates/keyshare/src/ -B 2 -A 10

Repository: gnosisguild/enclave

Length of output: 45


🏁 Script executed:

# Search for any handlers or converters that might turn EnclaveError into E3Failed
rg -n 'EnclaveEventData::EnclaveError' crates/ -B 3 -A 5

Repository: gnosisguild/enclave

Length of output: 1568


🏁 Script executed:

# Read the specific lines mentioned in the review comment
sed -n '47,52p;114,121p;174,182p;188,194p;323,330p;688,694p' agent/flow-trace/04_DKG_AND_COMPUTATION.md

Repository: gnosisguild/enclave

Length of output: 2318


🏁 Script executed:

# Search for ComputeRequestError handling
rg -n 'ComputeRequestError' crates/keyshare/src/threshold_keyshare.rs -B 3 -A 5

Repository: gnosisguild/enclave

Length of output: 45


🏁 Script executed:

# Search for ComputeRequestError across the entire codebase
rg -n 'ComputeRequestError' crates/ --type rust | head -20

Repository: gnosisguild/enclave

Length of output: 2620


🏁 Script executed:

# Check if ThresholdKeyshare subscribes to ComputeRequestError
rg -n 'subscribe.*ComputeRequestError|ComputeRequestError' crates/keyshare/src/threshold_keyshare.rs

Repository: gnosisguild/enclave

Length of output: 45


Verify the ThresholdKeyshare-side E3Failed routing for computation stage failures.

The documentation claims E3Failed emission from ThresholdKeyshare for GenPkShareAndSkSss, GenEsiSss, CalculateDecryptionKey, and CalculateDecryptionShare errors, but these handlers are not wired to emit E3Failed in the code.

Verification confirms that:

  • EncryptionKeyCollectionFailed and ThresholdShareCollectionFailed handlers exist (lines 2471, 2499) and do emit E3Failed (line 2562) ✓
  • The four computation response handlers exist but use ? operator to propagate errors into the trap() function, which converts them to EnclaveError events instead of E3Failed
  • ThresholdKeyshare does not subscribe to ComputeRequestError events, so computation failures don't trigger the documented E3Failed path

The doc is ahead of the code for computation stage failures. Either implement the E3Failed handlers for these stages or update the documentation to reflect that computation errors become EnclaveError events, not E3Failed.

Also applies to: 114-121, 174-182, 188-194, 323-330, 688-694

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agent/flow-trace/04_DKG_AND_COMPUTATION.md` around lines 47 - 52, The doc
claims computation-stage failures from ThresholdKeyshare (GenPkShareAndSkSss,
GenEsiSss, CalculateDecryptionKey, CalculateDecryptionShare) emit E3Failed, but
the code currently propagates errors via the ? operator into trap() producing
EnclaveError and ThresholdKeyshare also doesn't subscribe to
ComputeRequestError; fix by wiring these computation-response handlers to
explicitly emit E3Failed events instead of using ? into trap() and ensure
ThresholdKeyshare subscribes to ComputeRequestError; update the four handler
implementations (functions handling GenPkShareAndSkSss, GenEsiSss,
CalculateDecryptionKey, CalculateDecryptionShare) to convert errors into the
E3Failed emission path, and add subscription for ComputeRequestError in the
ThresholdKeyshare setup so computation failures trigger the E3Failed path rather
than EnclaveError.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HLT-006 / HLT-007: make PK and plaintext aggregation fail the round explicitly instead of stalling

1 participant