feat: stop encrypting DKG shares to self#1529
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughThe PR implements a protocol shift in the DKG share decryption flow, transitioning from decrypting ChangesDKG Share Decryption with Own-Party Separation
Sequence DiagramsequenceDiagram
participant KeyMgr as Keyshare Manager
participant BFV as BFV Encryption
participant Collector as Share Collector
participant Threshold as Threshold Keyshare
participant Circuit as Circuit Computation
participant Prover as ZK Prover
KeyMgr->>BFV: Encrypt shares with skip_idx=own_party_id
BFV->>BFV: For each recipient: if idx==skip_idx store None<br/>else encrypt to Some(ciphertext)
BFV-->>KeyMgr: BfvEncryptedShares with Some/None slots
KeyMgr->>Collector: Setup(own_party_id)<br/>Initialize todo excluding own_party_id
Collector-->>KeyMgr: Collector ready (no self-collection)
KeyMgr->>Threshold: Cache own_sk_share_raw<br/>and own_esi_shares_raw
Threshold->>Threshold: Store in pending_own_dkg_shares
KeyMgr->>Threshold: Transition to AggregatingDecryptionKey<br/>Consume pending_own_dkg_shares
Threshold->>Threshold: Persist own share material<br/>Compute own_plaintext_idx in sorted order
Threshold->>Threshold: Deserialize cached own shares<br/>Splice into collected plaintext matrices<br/>at own_plaintext_idx position
Threshold->>Threshold: Build C4 requests:<br/>own_plaintext_idx + own_share_raw<br/>+ (H-1) external ciphertexts
Threshold->>Circuit: ShareDecryptionCircuitData<br/>{ honest_ciphertexts: Vec<Option<...>>,<br/>own_plaintext_share, ... }
Circuit->>Circuit: For each slot in honest_ciphertexts:<br/>if Some(cts) → decrypt<br/>if None → use own_plaintext_share directly
Circuit-->>Prover: Computed commitments & shares
Prover->>Prover: match extract_for_party():<br/>Some → publish proof<br/>None → trace skip (no error)
Prover-->>KeyMgr: Proof published/skipped
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes This PR spans nine files with intricate changes across share encryption, circuit data structures, and proof-generation logic. The protocol shift requires understanding: (1) the mixed optional-slot representation in ciphertexts and how it flows end-to-end, (2) the caching and splicing of own plaintext shares at the correct sorted position, (3) the conditional logic in circuit computation that branches on Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. 👉 Get your free trial and get 200 agent minutes per Slack user (a $50 value). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 60 minutes.Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/keyshare/src/threshold_keyshare.rs`:
- Around line 1179-1188: The code currently passes party_id as the slot index to
BfvEncryptedShares::encrypt_all_extended and to the C3 skip logic, which is
wrong when collected_encryption_keys is filtered or reordered; instead compute
the self slot by finding the position of the local party_id inside
collected_encryption_keys and use that index as the skip index. Concretely:
locate where collected_encryption_keys is available, find the index via a
position lookup of entries matching party_id (e.g.,
collected_encryption_keys.iter().position(|(id, _)| id == &party_id)), handle
the none case with an error, and pass that computed usize as the skip_idx to
encrypt_all_extended and any C3 skip checks rather than using party_id directly
(references: collected_encryption_keys, party_id,
BfvEncryptedShares::encrypt_all_extended, C3 skip logic).
In `@crates/multithread/src/multithread.rs`:
- Around line 1194-1210: After deserializing own_share_raw into
own_plaintext_share you must also validate each inner row length matches the BFV
coefficient count (the expected number of coefficients used for plaintext
polynomials) to avoid variable-length shares; iterate over own_plaintext_share
(use enumerate to get the row index), and for each row check row.len() ==
expected_coefficient_count (the BFV/poly coefficient count used elsewhere in the
module), returning Err(make_zk_error(&request, format!("own_plaintext_share[{}]
has {} coefficients, expected {}", i, row.len(), expected_coefficient_count)))
on mismatch instead of proceeding.
In `@crates/zk-helpers/src/circuits/dkg/share_decryption/computation.rs`:
- Around line 201-204: The loop currently calls
data.secret_key.try_decrypt(&party_cts[mod_idx]).unwrap() which panics on
decryption failure; change it to propagate the error as a CircuitsErrors so the
method returns a proper ComputeRequestError instead of crashing. Replace the
unwrap usage in the loop that produces decrypted_pt (inside the function that
iterates mod_idx and computes share_coeffs) with error handling that maps the
try_decrypt Err into an appropriate CircuitsErrors variant (including context
like which mod_idx/party_cts failed) and returns it (using ? or map_err(...)?),
ensuring the rest of the function keeps its existing error type flow.
In `@crates/zk-prover/src/actors/proof_request.rs`:
- Around line 1244-1280: The match arm for
share.extract_for_party(positional_idx) currently treats every None as an
expected self-slot and just traces; change it to only skip (trace) when the None
corresponds to the sender’s own slot and treat any other None as an error:
inside the None branch compare the missing slot to the sender identity (use the
module’s sender/self positional identifier available in this actor—e.g., the
field that represents our own positional index or party id used elsewhere with
real_party_id/positional_idx), keep the trace and skip only for the own-slot
case, and for any other positional_idx log an error (include real_party_id and
positional_idx) so missing external recipient shares aren’t silently dropped
instead of being surfaced for upstream handling.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6201a406-25b2-497b-aeba-c561c80f41f3
📒 Files selected for processing (9)
crates/events/src/enclave_event/compute_request/zk.rscrates/keyshare/src/threshold_keyshare.rscrates/keyshare/src/threshold_share_collector.rscrates/multithread/src/multithread.rscrates/trbfv/src/shares/bfv_encrypted.rscrates/zk-helpers/src/circuits/dkg/share_decryption/circuit.rscrates/zk-helpers/src/circuits/dkg/share_decryption/computation.rscrates/zk-helpers/src/circuits/dkg/share_decryption/sample.rscrates/zk-prover/src/actors/proof_request.rs
| // BFV-encrypt shares to all recipients except own slot (own share is bound via C2, | ||
| // consumed locally by C4). Returns per-row randomness for C3 proofs. | ||
| let mut rng = OsRng; | ||
| let (encrypted_sk_sss, sk_witnesses) = BfvEncryptedShares::encrypt_all_extended( | ||
| &decrypted_sk_sss, | ||
| &recipient_pks, | ||
| ¶ms, | ||
| &mut rng, | ||
| Some(party_id as usize), | ||
| )?; |
There was a problem hiding this comment.
Derive the self slot from collected_encryption_keys, not from party_id.
encrypt_all_extended(..., skip_idx) is slot-based, but this code passes party_id as usize and reuses the same assumption for the C3 skips. That only works if collected_encryption_keys is always dense and perfectly ordered by party ID. Once an earlier party is filtered out, we skip the wrong slot, still emit a self-share, and drop another recipient's share/proofs instead.
🛠 Proposed fix
+ let own_recipient_idx = collected_encryption_keys
+ .iter()
+ .position(|k| k.party_id == party_id)
+ .ok_or_else(|| anyhow!("Own party {} missing from collected_encryption_keys", party_id))?;
+
// BFV-encrypt shares to all recipients except own slot (own share is bound via C2,
// consumed locally by C4). Returns per-row randomness for C3 proofs.
let mut rng = OsRng;
let (encrypted_sk_sss, sk_witnesses) = BfvEncryptedShares::encrypt_all_extended(
&decrypted_sk_sss,
&recipient_pks,
¶ms,
&mut rng,
- Some(party_id as usize),
+ Some(own_recipient_idx),
)?;
@@
BfvEncryptedShares::encrypt_all_extended(
esi,
&recipient_pks,
¶ms,
&mut rng,
- Some(party_id as usize),
+ Some(own_recipient_idx),
)
})
@@
- let own_idx = party_id as usize;
+ let own_idx = own_recipient_idx;Also applies to: 1248-1252, 1280-1283
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/keyshare/src/threshold_keyshare.rs` around lines 1179 - 1188, The code
currently passes party_id as the slot index to
BfvEncryptedShares::encrypt_all_extended and to the C3 skip logic, which is
wrong when collected_encryption_keys is filtered or reordered; instead compute
the self slot by finding the position of the local party_id inside
collected_encryption_keys and use that index as the skip index. Concretely:
locate where collected_encryption_keys is available, find the index via a
position lookup of entries matching party_id (e.g.,
collected_encryption_keys.iter().position(|(id, _)| id == &party_id)), handle
the none case with an error, and pass that computed usize as the skip_idx to
encrypt_all_extended and any C3 skip checks rather than using party_id directly
(references: collected_encryption_keys, party_id,
BfvEncryptedShares::encrypt_all_extended, C3 skip logic).
| // Own-plaintext share rows: bincode `Vec<Vec<u64>>` shape [L][N]. | ||
| let own_share_bytes = req | ||
| .own_share_raw | ||
| .access_raw(cipher) | ||
| .map_err(|e| make_zk_error(&request, format!("own_share decrypt: {}", e)))?; | ||
| let own_plaintext_share: Vec<Vec<u64>> = bincode::deserialize(&own_share_bytes) | ||
| .map_err(|e| make_zk_error(&request, format!("own_share deserialize: {}", e)))?; | ||
| if own_plaintext_share.len() != l { | ||
| return Err(make_zk_error( | ||
| &request, | ||
| format!( | ||
| "own_plaintext_share has {} moduli, expected {}", | ||
| own_plaintext_share.len(), | ||
| l | ||
| ), | ||
| )); | ||
| } |
There was a problem hiding this comment.
Validate the inner N dimension of own_share_raw too.
This only checks the outer [L] shape. If any decoded row has the wrong coefficient count, the circuit gets variable-length plaintext shares and fails much later with a less actionable proof/witness error. Validate each row against the BFV degree here.
Suggested fix
let own_plaintext_share: Vec<Vec<u64>> = bincode::deserialize(&own_share_bytes)
.map_err(|e| make_zk_error(&request, format!("own_share deserialize: {}", e)))?;
if own_plaintext_share.len() != l {
return Err(make_zk_error(
&request,
format!(
"own_plaintext_share has {} moduli, expected {}",
own_plaintext_share.len(),
l
),
));
}
+ let expected_degree = dkg_params.degree();
+ if let Some((row_idx, row)) = own_plaintext_share
+ .iter()
+ .enumerate()
+ .find(|(_, row)| row.len() != expected_degree)
+ {
+ return Err(make_zk_error(
+ &request,
+ format!(
+ "own_plaintext_share[{}] has {} coefficients, expected {}",
+ row_idx,
+ row.len(),
+ expected_degree
+ ),
+ ));
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/multithread/src/multithread.rs` around lines 1194 - 1210, After
deserializing own_share_raw into own_plaintext_share you must also validate each
inner row length matches the BFV coefficient count (the expected number of
coefficients used for plaintext polynomials) to avoid variable-length shares;
iterate over own_plaintext_share (use enumerate to get the row index), and for
each row check row.len() == expected_coefficient_count (the BFV/poly coefficient
count used elsewhere in the module), returning Err(make_zk_error(&request,
format!("own_plaintext_share[{}] has {} coefficients, expected {}", i,
row.len(), expected_coefficient_count))) on mismatch instead of proceeding.
| for mod_idx in 0..threshold_l { | ||
| let decrypted_pt = | ||
| data.secret_key.try_decrypt(&party_cts[mod_idx]).unwrap(); | ||
| let share_coeffs = decrypted_pt.value.deref().to_vec(); |
There was a problem hiding this comment.
Propagate BFV decryption failures instead of panicking.
A bad ciphertext here will panic the prover worker via unwrap(), which skips the normal ComputeRequestError flow and turns invalid input into a task-level crash. This should return a CircuitsErrors like the rest of the method.
Suggested fix
for mod_idx in 0..threshold_l {
- let decrypted_pt =
- data.secret_key.try_decrypt(&party_cts[mod_idx]).unwrap();
+ let decrypted_pt = data.secret_key
+ .try_decrypt(&party_cts[mod_idx])
+ .map_err(|e| {
+ CircuitsErrors::Other(format!(
+ "Failed to decrypt honest_ciphertexts[{}]: {:?}",
+ mod_idx, e
+ ))
+ })?;
let share_coeffs = decrypted_pt.value.deref().to_vec();📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for mod_idx in 0..threshold_l { | |
| let decrypted_pt = | |
| data.secret_key.try_decrypt(&party_cts[mod_idx]).unwrap(); | |
| let share_coeffs = decrypted_pt.value.deref().to_vec(); | |
| for mod_idx in 0..threshold_l { | |
| let decrypted_pt = data.secret_key | |
| .try_decrypt(&party_cts[mod_idx]) | |
| .map_err(|e| { | |
| CircuitsErrors::Other(format!( | |
| "Failed to decrypt honest_ciphertexts[{}]: {:?}", | |
| mod_idx, e | |
| )) | |
| })?; | |
| let share_coeffs = decrypted_pt.value.deref().to_vec(); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/zk-helpers/src/circuits/dkg/share_decryption/computation.rs` around
lines 201 - 204, The loop currently calls
data.secret_key.try_decrypt(&party_cts[mod_idx]).unwrap() which panics on
decryption failure; change it to propagate the error as a CircuitsErrors so the
method returns a proper ComputeRequestError instead of crashing. Replace the
unwrap usage in the loop that produces decrypted_pt (inside the function that
iterates mod_idx and computes share_coeffs) with error handling that maps the
try_decrypt Err into an appropriate CircuitsErrors variant (including context
like which mod_idx/party_cts failed) and returns it (using ? or map_err(...)?),
ensuring the rest of the function keeps its existing error type flow.
| match share.extract_for_party(positional_idx) { | ||
| Some(party_share) => { | ||
| let c3a_proofs = signed_c3a_map | ||
| .get(&positional_idx) | ||
| .cloned() | ||
| .unwrap_or_default(); | ||
| let c3b_proofs = signed_c3b_map | ||
| .get(&positional_idx) | ||
| .cloned() | ||
| .unwrap_or_default(); | ||
|
|
||
| if let Err(err) = self.bus.publish( | ||
| ThresholdShareCreated { | ||
| e3_id: e3_id.clone(), | ||
| share: Arc::new(party_share), | ||
| target_party_id: real_party_id, | ||
| external: false, | ||
| signed_c2a_proof: Some(signed_c2a.clone()), | ||
| signed_c2b_proof: Some(signed_c2b.clone()), | ||
| signed_c3a_proofs: c3a_proofs, | ||
| signed_c3b_proofs: c3b_proofs, | ||
| }, | ||
| ec.clone(), | ||
| ) { | ||
| error!( | ||
| "Failed to publish ThresholdShareCreated for party {} (idx {}): {err}", | ||
| real_party_id, positional_idx | ||
| ); | ||
| } | ||
| } | ||
| None => { | ||
| // Own slot is sparse (no self-encryption); nothing to publish. | ||
| trace!( | ||
| "Skipping ThresholdShareCreated for own slot (party {} idx {})", | ||
| real_party_id, | ||
| positional_idx | ||
| ); |
There was a problem hiding this comment.
Don't silently drop unexpected empty recipient slots.
extract_for_party() now returns None for both the intentional self slot and any other empty slot. This branch treats every None as expected, so a missing external recipient share/proof will only emit a trace and never get published, which can turn an upstream bug into a hard-to-debug collection timeout. Gate the skip on the sender’s own slot and keep logging an error for any other None.
Suggested fix
for (positional_idx, &real_party_id) in pending.recipient_party_ids.iter().enumerate() {
match share.extract_for_party(positional_idx) {
Some(party_share) => {
let c3a_proofs = signed_c3a_map
.get(&positional_idx)
.cloned()
.unwrap_or_default();
let c3b_proofs = signed_c3b_map
.get(&positional_idx)
.cloned()
.unwrap_or_default();
if let Err(err) = self.bus.publish(
ThresholdShareCreated {
e3_id: e3_id.clone(),
share: Arc::new(party_share),
target_party_id: real_party_id,
external: false,
signed_c2a_proof: Some(signed_c2a.clone()),
signed_c2b_proof: Some(signed_c2b.clone()),
signed_c3a_proofs: c3a_proofs,
signed_c3b_proofs: c3b_proofs,
},
ec.clone(),
) {
error!(
"Failed to publish ThresholdShareCreated for party {} (idx {}): {err}",
real_party_id, positional_idx
);
}
}
- None => {
- // Own slot is sparse (no self-encryption); nothing to publish.
- trace!(
- "Skipping ThresholdShareCreated for own slot (party {} idx {})",
- real_party_id,
- positional_idx
- );
- }
+ None if real_party_id == party_id => {
+ trace!(
+ "Skipping ThresholdShareCreated for own slot (party {} idx {})",
+ real_party_id,
+ positional_idx
+ );
+ }
+ None => {
+ error!(
+ "Missing ThresholdShare for non-self recipient {} (idx {})",
+ real_party_id,
+ positional_idx
+ );
+ }
}
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/zk-prover/src/actors/proof_request.rs` around lines 1244 - 1280, The
match arm for share.extract_for_party(positional_idx) currently treats every
None as an expected self-slot and just traces; change it to only skip (trace)
when the None corresponds to the sender’s own slot and treat any other None as
an error: inside the None branch compare the missing slot to the sender identity
(use the module’s sender/self positional identifier available in this
actor—e.g., the field that represents our own positional index or party id used
elsewhere with real_party_id/positional_idx), keep the trace and skip only for
the own-slot case, and for any other positional_idx log an error (include
real_party_id and positional_idx) so missing external recipient shares aren’t
silently dropped instead of being surfaced for upstream handling.
During threshold key generation, every node was previously encrypting a share of its own secret to itself, sending it over the wire, then decrypting it back, a wasteful round-trip with no security benefit (a node already knows its own share). This PR removes the self-loop end-to-end.
Summary by CodeRabbit
Release Notes