Summary
ZarrExec currently materializes the entire query result as a single RecordBatch before returning it to DataFusion. For large Cartesian products (e.g., 365 time steps × 100 lat × 100 lon = 3.65M rows), this means the full dataset must fit in memory at once. The execution should yield multiple smaller batches to enable streaming and reduce peak memory.
Current Behavior
All three execution paths in src/physical_plan/zarr_exec.rs follow the same pattern:
// zarr_exec.rs:305 (remote), :379 (virtualizarr), :445 (virtualizarr+adapter)
let batches: Vec<RecordBatch> = result_stream.try_collect().await?;
Ok::<_, DataFusionError>(stream::iter(batches.into_iter().map(Ok)))
The local sync path (execute_local, line ~326) reads everything into a single RecordBatch via read_zarr() and wraps it in a one-element stream.
This means:
- Memory: Peak memory = full result size (no streaming/backpressure)
- Latency: First row is only available after ALL rows are read
- LIMIT inefficiency: Even with limit pushdown, the batch is fully materialized before slicing
Proposed Change
Chunk the Cartesian product expansion into batches of ~8,192 rows:
- In
read_zarr() / read_zarr_async(): Instead of building one giant RecordBatch, yield batches by iterating over row ranges [0..8192), [8192..16384), ...
- In
ZarrExec::execute(): Return the chunked stream directly without .try_collect()
- Coordinate keys: Compute per-batch using the existing row-major formula (the formula in
create_coord_dictionary_typed already takes total_rows — change it to accept a row range)
- Data variables: Slice the nD array per batch rather than loading the full flattened column
Batch size considerations
- 8,192 rows is a common DataFusion default (
batch_size config)
- Arrow's memory allocator works well with predictable batch sizes
- Smaller batches enable DataFusion's pipeline execution (hash joins, aggregations start before full scan)
Impact
- Memory: O(batch_size) instead of O(total_rows) for the Cartesian product
- Latency: First batch available after reading ~8K rows worth of data
- LIMIT:
LIMIT 10 on a 3.65M row result now reads ≤8,192 rows instead of 3.65M
- DataFusion integration: Enables proper pipeline execution and backpressure
Files to Modify
src/reader/zarr_reader.rs — read_zarr() and read_zarr_async(): return impl Stream<Item = Result<RecordBatch>> instead of single batch
src/reader/coord.rs — create_coord_dictionary_typed(): accept row offset + batch length
src/physical_plan/zarr_exec.rs — Remove .try_collect() pattern, pass stream through directly
Motivation
Inspired by Vortex's ChunkedArray which enables streaming output with constant memory. DataFusion's execution model is designed for streaming batches — the current single-batch approach defeats pipeline execution, backpressure, and early termination.
Summary
ZarrExeccurrently materializes the entire query result as a singleRecordBatchbefore returning it to DataFusion. For large Cartesian products (e.g., 365 time steps × 100 lat × 100 lon = 3.65M rows), this means the full dataset must fit in memory at once. The execution should yield multiple smaller batches to enable streaming and reduce peak memory.Current Behavior
All three execution paths in
src/physical_plan/zarr_exec.rsfollow the same pattern:The local sync path (
execute_local, line ~326) reads everything into a singleRecordBatchviaread_zarr()and wraps it in a one-element stream.This means:
Proposed Change
Chunk the Cartesian product expansion into batches of ~8,192 rows:
read_zarr()/read_zarr_async(): Instead of building one giant RecordBatch, yield batches by iterating over row ranges[0..8192), [8192..16384), ...ZarrExec::execute(): Return the chunked stream directly without.try_collect()create_coord_dictionary_typedalready takestotal_rows— change it to accept a row range)Batch size considerations
batch_sizeconfig)Impact
LIMIT 10on a 3.65M row result now reads ≤8,192 rows instead of 3.65MFiles to Modify
src/reader/zarr_reader.rs—read_zarr()andread_zarr_async(): returnimpl Stream<Item = Result<RecordBatch>>instead of single batchsrc/reader/coord.rs—create_coord_dictionary_typed(): accept row offset + batch lengthsrc/physical_plan/zarr_exec.rs— Remove.try_collect()pattern, pass stream through directlyMotivation
Inspired by Vortex's
ChunkedArraywhich enables streaming output with constant memory. DataFusion's execution model is designed for streaming batches — the current single-batch approach defeats pipeline execution, backpressure, and early termination.