Feat/streaming_prover #1114

abiswas3 · 2025-11-17T23:18:01Z

A generalised sum-check API.
NOTE: Parallel performance of compute message sub-optimal right now due to increased complexity of building general-sized multi-quadratic for general window_sizes >=1

- Merging working spartan outer into the new api

- Merging working spartan outer into the new api. - things don't compile yet

R1CS eval changes brought in

Tests seem to pass. Double check with gitools if things are merged.

Tests seem to pass. Think I have everything.

Tests pass (at least the ones that were passing)

Typos on expanding table

jolt-core/src/utils/expanding_table.rs

Optimising stream to linear started

The linear schedule is fine, but the streaming schedule re-computation is highly suboptimal.

A much faster re-computation of Az/Bz with parallel code

Stream to linear is pretty much as fast as it needs to be.

graphite-app · 2025-11-19T07:12:20Z

jolt-core/src/zkvm/spartan/outer.rs

+        let grid_az_ptr = grid_az.as_mut_ptr() as usize;
+        let grid_bz_ptr = grid_bz.as_mut_ptr() as usize;
+        let chunk_size = 4096;
+        let num_chunks = (jlen + chunk_size - 1) / chunk_size;
+        (0..num_chunks).into_par_iter().for_each(move |chunk_idx| {
+            let start = chunk_idx * chunk_size;
+            let end = (start + chunk_size).min(jlen);
+
+            let az_ptr = grid_az_ptr as *mut F;
+            let bz_ptr = grid_bz_ptr as *mut F;
+
+            for j in start..end {
+                let az_j = acc_az[j].barrett_reduce();
+                let bz_first_j = acc_bz_first[j].barrett_reduce();
+                let bz_second_j = acc_bz_second[j].barrett_reduce();
+
+                unsafe {
+                    *az_ptr.add(j) = az_j;
+                    *bz_ptr.add(j) = bz_first_j + bz_second_j;
+                }
+            }
+        });


There appears to be a potential race condition in this parallel reduction. Each thread is using the same raw pointers (az_ptr and bz_ptr) and indexing with the absolute j value rather than a chunk-relative index.

To fix this, either:

Use slice-based access instead of raw pointers: grid_az[j] = az_j;

Or if using pointers for performance, adjust the index to be relative to the start of each chunk:

*az_ptr.add(j) = az_j;

should be:

*az_ptr.add(start + (j - start)) = az_j;

or more simply:

*az_ptr.add(j) = az_j;

The current approach could lead to threads writing to overlapping memory locations if the chunk boundaries aren't calculated correctly.

Spotted by Graphite Agent

Is this helpful? React 👍 or 👎 to let us know.

…er-merged

-- A functional version of this has been implemented, but i'll move the design around. But this is at least correct.

quangvdao

Thanks! There are a number of changes I'd like to land before this is merge-able:

Can you clean up all the TODOs, fix the performance regression, and streamline the code?
When switching to linear time, ideally we should compute the evals of that round as we materialize, rather than doing another pass over the bound evals
More generally, I think the following flow might be preferable for computing prover message (I have talked about this a number of times I think):

fn compute_message(...) -> {
  // 1. Build the multiquadratic eval for this window, if it does not exist (or number of variables have fallen to 0)
  // 1.a) If we are streaming from trace, and we don't want to materialize for this window (i.e. not the last window), call a `stream_eval` kernel
  // 1.b) If we are streaming from trace, and we want to materialize for this window (i.e. last window of streaming), call a `stream_materialize_eval` kernel
  // 1.c) If we have materialized, call a `materialize_eval` kernel
  
  // 2. From the multiquadratic poly, derive the current round's uni poly.
}

This allows us to be flexible on the number of rounds to compute, even when we have the materialized values. it is rare that we will use it on CPU, but we will most certainly use it on GPU, and it would be good to have a reference CPU version. It will also collapse into the linear-time version, if you compute smartly (the multiquadratic evals should omit / not compute the eval at (1, 1, ..., 1)).

NEXT: Proper parallelisation Fused materialisation (this is currently ok for grids of window size 1); but not generalised.

…nows. Well that's next.

…er-merged

abiswas3 · 2025-12-04T03:09:38Z

Thanks! There are a number of changes I'd like to land before this is merge-able:

Can you clean up all the TODOs, fix the performance regression, and streamline the code?

When switching to linear time, ideally we should compute the evals of that round as we materialize, rather than doing another pass over the bound evals

More generally, I think the following flow might be preferable for computing prover message (I have talked about this a number of times I think):
fn compute_message(...) -> {
  // 1. Build the multiquadratic eval for this window, if it does not exist (or number of variables have fallen to 0)
  // 1.a) If we are streaming from trace, and we don't want to materialize for this window (i.e. not the last window), call a `stream_eval` kernel
  // 1.b) If we are streaming from trace, and we want to materialize for this window (i.e. last window of streaming), call a `stream_materialize_eval` kernel
  // 1.c) If we have materialized, call a `materialize_eval` kernel
  
  // 2. From the multiquadratic poly, derive the current round's uni poly.
}
This allows us to be flexible on the number of rounds to compute, even when we have the materialized values. it is rare that we will use it on CPU, but we will most certainly use it on GPU, and it would be good to have a reference CPU version. It will also collapse into the linear-time version, if you compute smartly (the multiquadratic evals should omit / not compute the eval at (1, 1, ..., 1)).

For now we've adapted the API to suit the above.
We do not skip the (1,....1) for now.
As we're still trying to get the parellism right.
For small windows like w=1, 2, 3 -- it might be good to have a special manual solutions with loops unrolled, to maximum speed.

moodlezoup

Which of the new OuterRemainingSumcheckProver methods are going to be used in practice? I see that several method names are currently prefixed with an underscore, and some appear to be serial algorithms with a parallel counterpart. Let's delete anything that we're not going to use (including some of the stuff that's currently commented out).

moodlezoup · 2025-11-18T21:49:21Z

jolt-core/src/poly/multiquadratic_poly.rs

+            let eval_at_inf = self.evals[old_base_idx + 2]; // z_0 = ∞
+
+            self.evals[new_idx] =
+                eval_at_0 * (one - r) + eval_at_1 * r + eval_at_inf * r * (r - one);


nit: can pull r * (r - one) out of the for loop

I'll confirm the ones that'll be used after benching them. They'll likely be some specialised ones for small window sizes for speed.
I recommend keeping some of the serial implementations -- as the parallel code can often be hard to follow -- and theres's now a fair amount of nesting.

moodlezoup · 2025-12-01T14:49:31Z

tracer/src/lib.rs

    let lazy_trace_iter_ = lazy_trace_iter.clone();
+    // NOTE: this will materialise the trace in full
    let trace: Vec<Cycle> = lazy_trace_iter.by_ref().collect();
+    //let trace = lazy_trace_iter.by_ref();


nit: delete

moodlezoup · 2025-12-04T17:28:58Z

jolt-core/src/zkvm/r1cs/inputs.rs

+        // Clone and advance to target cycle
+        let mut iter = checkpoints[checkpoint_idx].clone();
+        for _ in 0..offset {
+            iter.next();
+        }


We should only clone each checkpoint once for the checkpoint_interval cycles it's responsible for ($$O(T)$$ per round). What you have here is $$O(T^2)$$

moodlezoup · 2025-12-04T17:31:05Z

jolt-core/src/poly/split_eq_poly.rs

+                // Streaming windows are not defined for HighToLow in the current
+                // Spartan code paths; return neutral head tables.
+                (&self.one_table, &self.one_table)


Seems like this should be unimplemented! for now?

moodlezoup · 2025-12-04T17:31:23Z

jolt-core/src/poly/split_eq_poly.rs

+            }
+            BindingOrder::HighToLow => {
+                // Not used for the outer Spartan streaming code.
+                vec![F::one()]


same here, prefer unimplemented!

moodlezoup · 2025-12-04T17:56:01Z

jolt-core/src/zkvm/spartan/outer.rs


 /// Degree bound of the sumcheck round polynomials for [`OuterRemainingSumcheckVerifier`].
 const OUTER_REMAINING_DEGREE_BOUND: usize = 3;
+const INFINITY: usize = 2; // 2 represents ∞ in base-3


This is kind of confusing to me, doesn't 2 represent 2 in base-3? 😅

You're right. This is a remnant from previous naming -- the actual value does not matter. I'll change this, thanks

moodlezoup · 2025-12-04T18:01:10Z

jolt-core/src/zkvm/spartan/outer.rs

-    first_round_evals: (F, F),
    #[allocative(skip)]
    params: OuterRemainingSumcheckParams<F>,
+    lagrange_evals_r0: [F; 10],


where does the number 10 come from? would be good to use a named const here

It should be OUTER_SKIP_DOMAIN_SIZE

moodlezoup · 2025-12-04T18:14:50Z

jolt-core/src/zkvm/spartan/outer.rs

+
+        let grid_az_ptr = grid_az.as_mut_ptr() as usize;
+        let grid_bz_ptr = grid_bz.as_mut_ptr() as usize;
+        let chunk_size = 4096;


should this be min(4096, jlen)?

moodlezoup · 2025-12-04T18:18:02Z

jolt-core/src/zkvm/spartan/outer.rs

+                unsafe {
+                    *az_ptr.add(j) = az_j;
+                    *bz_ptr.add(j) = bz_first_j + bz_second_j;
+                }


instead of this unsafe pointer stuff, can't we just use grid_az.par_chunks_mut(chunk_size).zip(grid_bz.par_chunk_mut(chunk_size)).enumerate() or something?

moodlezoup · 2025-12-04T19:12:53Z

jolt-core/src/zkvm/spartan/outer.rs

+        let mut acc_bz_first = vec![Acc6S::<F>::zero(); jlen];
+        let mut acc_bz_second = vec![Acc7S::<F>::zero(); jlen];
+
+        if !parallel {


Are we ever going to use the serial version in practice?

Ari added 8 commits November 17, 2025 17:26

feat: spartan outer streaming merge

f725ba9

- Merging working spartan outer into the new api

feat: spartan outer streaming merge

18b73cc

- Merging working spartan outer into the new api. - things don't compile yet

feat: spartan outer streaming merge

cf1235f

R1CS eval changes brought in

feat: spartan outer streaming merge

054412b

Tests seem to pass. Double check with gitools if things are merged.

feat: spartan outer streaming merge

eedb7f7

Tests seem to pass. Double check with gitools if things are merged.

feat: spartan outer streaming merge

adbca7f

Tests seem to pass. Think I have everything.

feat: spartan outer streaming merge

2f951d6

Tests pass (at least the ones that were passing)

feat: spartan outer streaming merge

97422f5

Typos on expanding table

graphite-app bot reviewed Nov 17, 2025

View reviewed changes

jolt-core/src/utils/expanding_table.rs Outdated Show resolved Hide resolved

Ari added 6 commits November 18, 2025 17:58

feat: spartan outer streaming merge

238ca0e

Optimising stream to linear started

feat: spartan outer streaming merge

63a164f

The linear schedule is fine, but the streaming schedule re-computation is highly suboptimal.

feat: spartan outer streaming merge

d65c736

A much faster re-computation of Az/Bz with parallel code

feat: spartan outer streaming merge

01322d4

Stream to linear is pretty much as fast as it needs to be.

Merge remote-tracking branch 'upstream/main' into tmp

ac240a5

Merge remote-tracking branch 'upstream/main' into tmp

31fbe6e

graphite-app bot reviewed Nov 19, 2025

View reviewed changes

Ari added 8 commits November 23, 2025 19:27

Merge remote-tracking branch 'upstream/main' into feat/streaming-prov…

7eee627

…er-merged

Merge remote-tracking branch 'upstream/main' into feat/streaming-prov…

967c6e3

…er-merged

chore: allocative fixes after merge

c7f1195

feat: trace from checkpoints integrated

0b2adca

-- A functional version of this has been implemented, but i'll move the design around. But this is at least correct.

chore: clippy bloody clippy issues

5b4d4c6

chore: removing unnecessary comments and getting ready to merge in main

df30169

chore: ensuring tests run on CI

fbdffd3

chore: removing trace-checkpointing for now

ff57c9a

quangvdao suggested changes Dec 1, 2025

View reviewed changes

Ari added 5 commits December 3, 2025 02:45

compute_grid_from_poly works for window size = 1

ac35118

compute_grid_from_poly works for general window size

23e57e5

ready to import new streaming schedule trait

651661f

ready to import new streaming schedule trait

da3ec13

the new streaming schedule api passes tests

4bebffc

NEXT: Proper parallelisation Fused materialisation (this is currently ok for grids of window size 1); but not generalised.

Ari added 7 commits December 3, 2025 17:09

better names: starting to parallelise grid build from poly

7ef65ce

first version of parallel compute_grid seems to work

ecaf942

all functions are correct. Whether they are fast or not -- god only k…

709df5e

…nows. Well that's next.

all functions are correct. Whether they are fast or not -- god only k…

ee75eef

…nows. Well that's next.

testing things

981d3cb

api-correctl; performance fairly wanting

2abdfe1

Merge remote-tracking branch 'upstream/main' into feat/streaming-prov…

ce8bb12

…er-merged

minor warnings removed from compiler (not deteced on mac-OX)

5e4b40c

moodlezoup requested changes Dec 4, 2025

View reviewed changes

remove unsafe & strengthen debug asserts

a38560e

Feat/streaming_prover #1114

Are you sure you want to change the base?

Feat/streaming_prover #1114

Uh oh!

Conversation

abiswas3 commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

graphite-app bot Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

quangvdao left a comment

Choose a reason for hiding this comment

Uh oh!

abiswas3 commented Dec 4, 2025

Uh oh!

moodlezoup left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

abiswas3 commented Nov 17, 2025 •

edited

Loading