You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/spartan_parallel.md
+55-19Lines changed: 55 additions & 19 deletions
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@
9
9
- (Runtime) Shift and program IO proofs
10
10
11
11
## High-Level Idea
12
-
The program is executed correctly iff all of the following holds:
12
+
The program, when divided into blocks, is executed correctly iff all of the following holds:
13
13
1. All blocks of the program are executed correctly
14
14
2. All registers (including the label of the next block) are passed correctly between blocks.
15
15
3. The memory state (read-only and RAM) stays coherent throughout the execution.
@@ -51,11 +51,11 @@ We note that the above steps imply the following additional procedures:
51
51
52
52
Permutations are checked via grand product proofs. Thus step 6 can be further divided into
53
53
54
-
* (6a) $RLC_{i, o} = \prod_{i, j} (\tau - f(i_{i, j}) - r\cdot f(o_{i, j}))$, where $\tau$ and $r$ are random numbers and $f$ is a random linear combination (see consistency below). Compute $RLC_p$, $RLC_v$ as well.
54
+
* (6a) $RLC_{i, o} = \prod_{i, j} (\tau - f(i_{i, j}) - r\cdot f(o_{i, j}))$, where $\tau$ and $r$ are random numbers and $f$ is a random linear combination (see consistency below). Compute $RLC_p = \prod_{i, j} (\tau - f(p_{i, j}))$, $RLC_v = \prod_{i, j} (\tau - f(v_{i, j}))$.
While the prover has computed $f(i'_k)$ and $f(o'_k)$ during permutation, it can use them for consistency check. We can rewrite _consistency_ as:
58
+
Since the prover has computed $f(i'_k)$ and $f(o'_k)$ during permutation, it can use them for consistency check. We can rewrite _consistency_ as:
59
59
60
60
2. $f(i'_k) = f(o'_{k-1})$ for all $k \neq 0$. (_consistency_)
61
61
@@ -89,16 +89,16 @@ Also, $\mathcal{C}'_i$ are the larger circuits while $\mathcal{C}_c$, $\mathcal{
89
89
90
90
The previous steps generate in total $b + 4$ circuits of various sizes. Our circuit commitment follows the existing batched sparse polynomial commitment scheme of Spartan. However, with circuits of different sizes, we want to only pay proportional to the approximate size of each circuit. The solution is to divide circuits of different sizes into groups and commit each groups separately.
91
91
92
-
Let each circuit $\mathcal{C}_i$ be of size $M_i\times N_i$ with $L_i$ non-zero entries. We assume that $M_i$ and $N_i$ are of similar sizes and $N_i$, and $L_i$ are roughly proportional to each other.
92
+
Let each circuit $\mathcal{C}_i$ be of size $M_i\times N_i$ with $L_i$ non-zero entries. We assume that $M_i$ and $N_i$ are of similar sizes and $N_i$ and $L_i$ are roughly proportional to each other.
93
93
For each commitment, the prover pays for $O(L)$ time to generate a proof of size $O(\log(N) + \log(L))$.
94
94
95
95
Our strategy is thus to group the circuits by the size of $N_i$. For each circuit $i$, the prover rounds its $N_i$ to the nearest power of 16 (or some other adjustable value) and put it in the corresponding group. For each group $j$, the prover computes the maximum $M_j$, $N_j$ and $L_j$ within the group, and batch commits every circuit of that group as $M_j \times N_j$ with $L_j$ non-zero entries.
96
96
97
97
There is, however, a problem with this approach with regard to the modified block circuits $\mathcal{C}'_i$. Recall that each $\mathcal{C}'_i$ matches with an assignment $z'_{i, j}$ of 5 segments:
each segment padded to the length of the block witness $|z_{i, j}|$. To perform the same sumcheck across all blocks, the size of all circuits needs to be _conceptually_ equivalent. (e,g, let $z_\text{max} = \max_{i, j} |z_{i, j}|$, for every block, the first variable of $r$ is always the $z_\text{max}$'th entry, the first variable of $rz_{i, j}$ is always the $2\times z_\text{max}$'th entry, etc.). However, for blocks of a different size, the padding size is different, and thus the first variable of $r$ will not be the same entry.
99
+
each segment padded to the length of the block witness $|z_{i, j}|$. To perform the same sumcheck across all blocks, the size of all circuits needs to be _conceptually_ equivalent. (e,g, let $z_\text{max} = \max_{i, j} |z_{i, j}|$, for every block, $r[0] \leftrightarrow z'_{i, j}[z_\text{max}]$, $rz_{i, j}[0] \leftrightarrow z'_{i, j}[2\times z_\text{max}]$, etc.). However, for blocks of a different size, the padding size is different, and thus $r[0]$ will not always be the same entry of $z'_{i, j}$.
100
100
101
-
The solution is for the prover to keep two versions of each block circuit (toggled by `COMMIT_MODE` in `gen_block_inst`). In the _sumcheck version_, every circuit is of the same $M_{\text{max}} = \max{M}$ and $N_{\text{max}} = \max{N}$. In the _commit version_, every circuit has $M$ and $N$ according to their group. Note that the prover's time and space cost to process a circuit is linear to their number of non-zero entries, which is the same for both versions. The prover can thus use the sumcheck version to perform the sumcheck of all blocks together, and use the commit version to reduce the time size of the commitment.
101
+
The solution is to provide the prover with two versions of each block circuit (toggled by `COMMIT_MODE` in `gen_block_inst`). In the _sumcheck version_, every circuit is of the same $M_{\text{max}} = \max{M}$ and $N_{\text{max}} = \max{N}$. In the _commit version_, every circuit has $M$ and $N$ according to their group. Note that the prover's time and space cost to process a circuit is linear to their number of non-zero entries, which is the same for both versions. The prover can thus use the sumcheck version to perform the sumcheck of all blocks together, and use the commit version to reduce the time size of the commitment.
102
102
103
103
The discrepancy between the two versions requires additional handling of commitment opening. The sumcheck produces two lists of challenges corresponding to the two dimensions of the circuit: $|rx| = \log M_{\text{max}}$, $|ry| = \log N_{\text{max}}$. On the constraint side, if $M_j < M_{\text{max}}$, the prover divides $rx \to rx_\text{pad} || rx_\text{eval}$. On the witness side, if $N_j < N_{\text{max}}$, the prover divides $ry\to ry_\text{comb} || ry_\text{pad} || ry_\text{eval}$. We describe each section:
104
104
* $rx_\text{pad}$ has length $(\log M_{\text{max}} - \log M_j)$ are the "extra" challenges
@@ -154,7 +154,7 @@ _XXX: we should be able to reduce the length to `total_num_phy_mem_accesses * 4`
154
154
155
155
All witnesses are committed using regular dense polynomial commitment schemes. `block_vars_matrix`, `block_w2`, `block_w3`, and `block_w3_shifted` are committed by each type of block. We note that we can use tricks similar to circuit commitment above to batch commit and batch open witness commitments.
156
156
157
-
## Sumcheck on Circuits and Instances
157
+
## Sumcheck Proof on Circuits and Instances
158
158
> Relevant files: `src/customdensepolynomial.rs`, `src/r1csproof.rs` and `src/sumcheck.rs`
159
159
160
160
The main section of `spartan_parallel` is consisted of three proofs, each with its own sumcheck and commitment opening. Each proof handles:
* $Q_i$ (`num_proofs`): number of assignments to each circuit $i$.
172
-
* $X$ (`num_cons`): _maximum_ number of constraints of any circuit
172
+
* $X$ (`num_cons`): _maximum_ number of constraints of any circuit.
173
173
* $W$ (`num_witness_secs`): number of padded segments of $z'_{i, j}$. In this case, $W = 8$.
174
-
* $Y$ (`max_num_inputs`): _maximum_ number of witnesses of any circuit
174
+
* $Y$ (`max_num_inputs`): _maximum_ number of witnesses of any circuit.
175
175
176
176
We use the lowercase version of each variable to denote their logarithmic value (e.g. $p = \log P$). Below we walkthrough the proving process of `spartan_parallel`.
177
177
@@ -180,7 +180,7 @@ The goal of Spartan is to prove that $Az \cdot Bz - Cz = 0$. This is separated i
To implement data-parallelism, we divide Spartan into 4 steps.
186
186
@@ -189,19 +189,55 @@ To implement data-parallelism, we divide Spartan into 4 steps.
189
189
190
190
While in regular Spartan, $Az$ is simply a length-$X$ vector, obtained by multiplication of a $X\times Y$ matrix $A$ by a length-$Y$ vector $z$, the data-paralleled version is slightly more complicated.
191
191
192
-
The prover's first task is to construct a `$P\times Q_i\times W\times Y_i` struct `z_mat` through a 4-dimensional vector. For reasons later illustrated in the sumcheck, the $Q_i$ and $Y_i$ sections of `z_mat` are stored _in bit-reverse_: let $Q_\text{max} = \max_i Q_i$, then for a circuit $i$ with $Q_i$ satisfying assignments, assignment $j$ will be stored in the entry:
where $\text{bit\_reverse}_{q_\text{max}}(x)$ expresses $x$ using $q_\text{max}$ bits and returns the value produced by assembling the bits from right to left.
195
-
196
-
For example, let $\max_i Q_i = 32$ and $Q_i = 8$ for a particular circuit $i$. The witnesses for execution 3 of the block is stored in entry $\text{bit\_reverse}_{\log 32}(3) = 11000_b = 24 * 8 / 32 = 6$.
192
+
The prover's first task is to construct a $P\times Q_i\times W\times Y_i$ struct `z_mat` through a 4-dimensional vector. The size of $Q_i$ and $Y_i$ depends on the entry of the $P$ dimension, while the size of $W$ does not. Further, even though sumcheck requires the size of each dimension to be _conceptually_ a power of 2, `z_mat` only needs to store the entries that contain non-zero values.
197
193
198
194
To obtain $Az$, $Bz$, $Cz$, the prover treats `z_mat` as $P$ counts of $Q_i \times (W \cdot Y_i)$ matrix. Since $A$, $B$, $C$ can be expressed as $P$ counts of $X_i\times (W \cdot Y_i)$ matrices, this allows the prover to perform $P$ matrix multiplications to obtain $P \times Q_i \times X_i$ tensors $Az$, $Bz$, $Cz$ and their MLE $\tilde{Az}$ (`poly_Az`), $\tilde{Bz}$, $\tilde{Cz}$. This process is described in `R1CSinstance::multiply_vec_block`. Note that:
199
195
* Conceptually, `poly_Az` of every block $i$ has $p + q_\text{max} + x_\text{max}$ variables. However, the value of the variables indexed at $[p, p + q_\text{max} - q_i)$ and $[p + q_\text{max}, p + q_\text{max} + x_\text{max} - x_i)$ does not affect the evaluation of the polynomial.
200
196
* Each circuit $i$ has different $Q_i$ and $X_i$, so $Az$ is expressed as a 3-dimensional vector, and the prover stores its MLE in a concise structure `DensePolynomialPqx`.
201
-
* For efficiency of the sumcheck, the $Q_i$ and $X_i$ sections of `poly_Az` are stored in bit-reverse. Recall that the $Q_i$ and $Y_i$ sections of `z_mat` are stored in bit-reverse, this means that, during matrix multiplication:
202
-
- The $Q_i$ dimension is already bit-reversed in `z_mat` and does not require additional action.
203
-
- The $X_i$ dimension is in its natural order in $A$, and thus needs to be reversed during multiplication.
204
-
- The $Y_i$ dimension is in its natural order in $A$ but in bit-reverse order in `z_mat`, so the dot product requires reversing one of the two.
`DensePolynomialPqx` is expressed as a 4-dimensional vector `Z` ($P\times Q_i\times W\times X_i$) and its relevant metadatas. However, every invocation of `DensePolynomialPqx` in the proof only uses 3 of the 4 dimensions, which are:
202
+
*`poly_Az`, `poly_Bz`, `poly_Cz`: uses $P$, $Q_i$ and $X_i$ dimension (so $W$ dimension always have length-1)
203
+
*`ABC_poly` binded to `rx`: uses $P$, $W$ and $X_i$ dimensions. Note that the $X_i$ dimension stores the witnesses and is actually the $Y_i$ dimension, but it behaves the same way as $X_i$.
204
+
*`Z_poly` binded to `rq`: uses $P$, $W$ and $X_i$ dimensions, where $X_i$ is again $Y_i$.
205
+
206
+
A `DensePolynomialPqx` can, however, contain all 4 dimensions. A full evaluations thus requires the binding of $p + q_\text{max} + w + x_\text{max}$ variables, which we divide into 4 sections: $r_p, r_q, r_w, r_x$.
207
+
208
+
We first note that binding from left to right can cause inefficiencies. This is because each binding on a variable of the $P$ dimension merges two vectors on the $Q_i$ dimension, which may be of different lengths, into a single vector of the longer length. As for a toy example, assume that a polynomial $G$ only has 2 dimensions $P\times Q_i$, and let $P = 4$ and $Q_i = [4, 4, 2, 2]$. The polynomial would thus contain 4 variables:
Since $Q_2 = Q_3 = 2$, $G_{2, 2}, G_{2, 3}, G_{3, 2}, G_{3, 3}$ are all 0s, thus the prover does not access nor perform operations on them. As a result, in the first round, the prover's work is $\sum_i Q_i = 12$ multiplications. However, after the first round, the prover is left with $P = 2$ and $Q_i = [4, 4]$. So its work binding $x_{p, 1}$ would be 8 multiplications.
216
+
217
+
Now consider the alternative of binding $x_{q, 1}$ first. All bindings are performed within the $Q$ dimension:
This again costs 12 multiplications. However, this time it leaves us with $P = 4$ and $Q_i = [2, 2, 1, 1]$, and the next binding of $x_{q, 0}$ costs only 6 multiplications.
224
+
225
+
As a result, binding on `DensePolynomialPqx` is always performed from right to left. Binding on an $r_x$ variable, for instance, would translate to:
226
+
```
227
+
max_num_inputs = max_num_inputs.div_ceil(2)
228
+
for p in 0..num_instances:
229
+
for q in 0..num_proofs[p]:
230
+
for w in 0..num_witness_secss:
231
+
num_inputs[p] = num_inputs[p].div_ceil(2)
232
+
for x in 0..num_inputs[p]:
233
+
Z[p][q][w][x] = (1 - r) * Z[p][q][w][2 * x]
234
+
if 2 * x + 1 < Z[p][q][w].len():
235
+
Z[p][q][w][x] += r * Z[p][q][w][2 * x + 1]
236
+
```
237
+
which merges entry $2x$ and $2x+1$ into entry $x$. Note that if `num_inputs[p] = 1`, then the binding simply multiplies the first and only entry by $1-r$ without further modification.
238
+
239
+
240
+
205
241
206
242
#### Sumcheck 1
207
243
> Relevant functions: `R1CSProof::prove_phase_one` and `SumcheckInstanceProof::prove_cubic_with_additive_term_disjoint_rounds`
0 commit comments