WIP sumcheck section

Jiangkm3 · Jiangkm3 · commit d0907b252a97 · 2024-12-22T22:06:59.000-05:00
diff --git a/docs/spartan_parallel.md b/docs/spartan_parallel.md
@@ -17,10 +17,10 @@ The program is executed correctly iff all of the following holds:
 Statement 1 can be checked directly through the block-specific circuits emitted by `circ_blocks`, while statement 2 and 3 can be checked by "extracting" inputs, outputs, and memory accesses out of block witnesses and check that they are pairwise consistent. `spartan_parallel` achieves so by generating "extraction circuits" and "consistency check circuits" based on compile-time metadata (number of inputs, outputs, and number of memory accesses per block). Furthermore, all three statements require witnesses to be arranged in different orders (statement 1 by block type, statement 2 by execution time, statement 3 by memory address), `spartan_parallel` inserts "permutation circuits" to verify the permutation between all three ordering: construct three univariate polynomials and test their equivalence by evaluating on a random point. However, to ensure that the same set of witnesses are used by both block correctness check and permutation check, the prover needs to use the same commitment for both proofs. To prevent excessive commitment opening, `spartan_parallel` commits the overlapping witnesses of block correctness and permutation separately.
 
 ## Circuit Preprocessing and Commitment (Compile Time)
-Relevant files: `examples/interface.rs`, `src/instance.rs`, and `src/r1csinstance.rs`
+> Relevant files: `examples/interface.rs`, `src/instance.rs`, and `src/r1csinstance.rs`
 
 ### Inputs from `circ_blocks`
-Relevant struct: `CompileTimeKnowledge` in `examples/interface.rs`
+> Relevant struct: `CompileTimeKnowledge` in `examples/interface.rs`
 
 At compile time, `spartan_parallel` reads in from `circ_blocks` through the struct `CompileTimeKnowledge`, including the R1CS circuit for each block (`args`) and all relevant metadata (number of inputs, witnesses, memory operations, etc. per block).
 
@@ -35,7 +35,7 @@ where
 * $w_i$ contains all other intermediate computations used by the block.
 
 ### Expanding and Generating Circuits
-Relevant struct: `Instance` in `src/instance.rs`
+> Relevant struct: `Instance` in `src/instance.rs`
 
 A prover of `spartan_parallel` needs to show the following:
 1. For every block $i$, the witness generated from every execution $j$ of that block $z_{i, j}$ satisfies $\mathcal{C}_i$. (_block correctness_)
@@ -80,7 +80,7 @@ Note that the verifier can check 6c efficiently without sumcheck.
 Also, $\mathcal{C}'_i$ are the larger circuits while $\mathcal{C}_c$, $\mathcal{C}_p$, $\mathcal{C}_v$, $\mathcal{C}_\pi$ are small and easily parallelizable.
 
 ### Committing Circuits through Sparse Poly Commitment
-Relevant functions:
+> Relevant functions:
 * `next_group_size` in `src/instance.rs`
 * `gen_block_inst` in `src/instance.rs`
 * `SNARK::multi_encode` in `src/lib.rs`
@@ -113,10 +113,10 @@ $$\mathcal{C}_\text{sumcheck}(rx || ry) =  (\prod_{r\in rx_\text{pad} || ry_\tex
 So the opening is performed on $(rx_\text{eval} || ry_\text{comb} || ry_\text{eval})$, and the verifier checks the result by computing and multiplying by $\prod_{r\in rx_\text{pad} || ry_\text{pad}} (1 - r)$.
 
 ## Witness Preprocessing and Generation
-Relevant files: `examples/interface.rs` and `src/lib.rs`
+> Relevant files: `examples/interface.rs` and `src/lib.rs`
 
 ### Inputs from `circ_blocks`
-Relevant struct: `RunTimeKnowledge` in `examples/interface.rs`
+> Relevant struct: `RunTimeKnowledge` in `examples/interface.rs`
 
 At runtime, `spartan_parallel` reads in from `circ_blocks` through the struct `RunTimeKnowledge`, which describes all the witnesses generated from the blocks:
 * `block_vars_matrix`: all the inputs, outputs, memory accesses, and intermediate computations of every block executions, grouped by type of blocks.
@@ -126,7 +126,7 @@ At runtime, `spartan_parallel` reads in from `circ_blocks` through the struct `R
 * `addr_ts_bits_list`: bit split of timestamp difference, used by memory coherence check.
 
 ### Witness Preprocessing and Commitment
-Relevant file: `src/lib.rs`
+> Relevant file: `src/lib.rs`
 
 Apart from the witnesses provided by each block execution, the prover also needs to compute additional witnesses used by permutation and consistency checks. This includes, most notably:
 * `perm_w0 = [tau, r, r^2, ...]`: the randomness used by the random linear permutation. This value is can be efficiently generated by the verifier and does not require commitment.
@@ -155,7 +155,7 @@ _XXX: we should be able to reduce the length to `total_num_phy_mem_accesses * 4`
 All witnesses are committed using regular dense polynomial commitment schemes. `block_vars_matrix`, `block_w2`, `block_w3`, and `block_w3_shifted` are committed by each type of block. We note that we can use tricks similar to circuit commitment above to batch commit and batch open witness commitments.
 
 ## Sumcheck on Circuits and Instances
-Relevant files: `src/customdensepolynomial.rs`, `src/r1csproof.rs` and `src/sumcheck.rs`
+> Relevant files: `src/customdensepolynomial.rs`, `src/r1csproof.rs` and `src/sumcheck.rs`
 
 The main section of `spartan_parallel` is consisted of three proofs, each with its own sumcheck and commitment opening. Each proof handles:
 1. Block correctness and grand product on block-ordered witnesses
@@ -175,3 +175,40 @@ We denote the following parameters for the proof:
 
 We use the lowercase version of each variable to denote their logarithmic value (e.g. $p = \log P$). Below we walkthrough the proving process of `spartan_parallel`.
 
+The goal of Spartan is to prove that $Az \cdot Bz - Cz = 0$. This is separated into two sumchecks:
+* Sumcheck 1 proves that given purported polynomial extensions $\tilde{Az}, \tilde{Bz}, \tilde{Cz}$, 
+$$\sum \tilde{\text{eq}} \cdot (\tilde{Az} \cdot \tilde{Bz} - \tilde{Cz}) = 0$$
+* Sumcheck 2 proves that given purported polynomial extensions $\tilde{A}, \tilde{B}, \tilde{C}, \tilde{z}$,
+ $$(r_A\cdot \tilde{A} + r_B\cdot \tilde{B} + r_C\cdot \tilde{C})\cdot \tilde{z} = r_A\cdot \tilde{Az} + r_B\cdot \tilde{Bz} + r_C\cdot \tilde{Cz}$$
+For some random $r_A$, $r_B$, $r_C$.
+
+To implement data-parallelism, we divide Spartan into 4 steps.
+
+#### Obtaining $\tilde{Az}, \tilde{Bz}, \tilde{Cz}$
+> Relevant files: `src/r1csinstance.rs` and `src/customdensepolynomial.rs`
+
+While in regular Spartan, $Az$ is simply a length-$X$ vector, obtained by multiplication of a $X\times Y$ matrix $A$ by a length-$Y$ vector $z$, the data-paralleled version is slightly more complicated.
+
+The prover's first task is to construct a `$P\times Q_i\times W\times Y_i` struct `z_mat` through a 4-dimensional vector. For reasons later illustrated in the sumcheck, the $Q_i$ and $Y_i$ sections of `z_mat` are stored _in bit-reverse_: let $Q_\text{max} = \max_i Q_i$, then for a circuit $i$ with $Q_i$ satisfying assignments, assignment $j$ will be stored in the entry:
+$$\text{bit\_reverse}_{q_\text{max}}(j) \cdot (Q_i / Q_\text{max})$$
+where $\text{bit\_reverse}_{q_\text{max}}(x)$ expresses $x$ using $q_\text{max}$ bits and returns the value produced by assembling the bits from right to left.
+
+For example, let $\max_i Q_i = 32$ and $Q_i = 8$ for a particular circuit $i$. The witnesses for execution 3 of the block is stored in entry $\text{bit\_reverse}_{\log 32}(3) = 11000_b = 24 * 8 / 32 = 6$.
+
+To obtain $Az$, $Bz$, $Cz$, the prover treats `z_mat` as $P$ counts of $Q_i \times (W \cdot Y_i)$ matrix. Since $A$, $B$, $C$ can be expressed as $P$ counts of $X_i\times (W \cdot Y_i)$ matrices, this allows the prover to perform $P$ matrix multiplications to obtain $P \times Q_i \times X_i$ tensors $Az$, $Bz$, $Cz$ and their MLE $\tilde{Az}$ (`poly_Az`), $\tilde{Bz}$, $\tilde{Cz}$. This process is described in `R1CSinstance::multiply_vec_block`. Note that:
+* Conceptually, `poly_Az` of every block $i$ has $p + q_\text{max} + x_\text{max}$ variables. However, the value of the variables indexed at $[p, p + q_\text{max} - q_i)$ and $[p + q_\text{max}, p + q_\text{max} + x_\text{max} - x_i)$ does not affect the evaluation of the polynomial.
+* Each circuit $i$ has different $Q_i$ and $X_i$, so $Az$ is expressed as a 3-dimensional vector, and the prover stores its MLE in a concise structure `DensePolynomialPqx`.
+* For efficiency of the sumcheck, the $Q_i$ and $X_i$ sections of `poly_Az` are stored in bit-reverse. Recall that the $Q_i$ and $Y_i$ sections of `z_mat` are stored in bit-reverse, this means that, during matrix multiplication:
+  - The $Q_i$ dimension is already bit-reversed in `z_mat` and does not require additional action.
+  - The $X_i$ dimension is in its natural order in $A$, and thus needs to be reversed during multiplication.
+  - The $Y_i$ dimension is in its natural order in $A$ but in bit-reverse order in `z_mat`, so the dot product requires reversing one of the two.
+
+#### Sumcheck 1
+> Relevant functions: `R1CSProof::prove_phase_one` and `SumcheckInstanceProof::prove_cubic_with_additive_term_disjoint_rounds`
+
+Similar to the regular Spartan, sumcheck 1 is of the following form:
+$$\sum \tilde{\text{eq}} \cdot (\tilde{Az} \cdot \tilde{Bz} - \tilde{Cz}) = 0$$
+
+Except that $\tilde{Az}$, $\tilde{Bz}$, and $\tilde{Cz}$ are now $p + q_\text{max} + x_\text{max}$-variate polynomials, which means the sumcheck involves $p + q_\text{max} + x_\text{max}$ rounds and returns with the challenge $r = r_p || r_q || r_x$. However, we want the prover to only perform $\sum_i Q_i \cdot X_i$ computations (as opposed to $P \cdot Q_\text{max} \cdot X_\text{max}$).
+
+We first note that the bindings of the $P$ variables must take place last