-
Notifications
You must be signed in to change notification settings - Fork 195
Description
Summary
Currently, ConnectorX loads the entire query result into memory before returning it as a Vec.
This makes it inefficient or even infeasible for large datasets — especially when used in a long-running server process, as memory usage can easily grow to multiple gigabytes and potentially cause OOM.
I’m using the Rust implementation of ConnectorX in a server context and would like to fetch and propagate results in a streaming fashion instead of materializing everything at once.
Problem
ConnectorX’s design focuses on parallelized, in-memory loading of data.
While this is optimal for batch ETL and data science workflows, it’s not suitable for:
- Long-running backend services
- APIs that need to progressively send data to clients
- Large analytical queries that don’t fit in memory
When dealing with tens or hundreds of millions of rows, the current API (returning a Vec) forces all data to be stored in RAM at once.
Proposed Solution
Introduce a streaming interface to ConnectorX’s Rust API, e.g.:
let mut stream = cx_stream::<PostgresArrowTransport>(&source_conn, &query)?;
while let Some(batch) = stream.next().await? {
// process or forward each RecordBatch incrementally
}
This could work similarly to Arrow Flight’s Stream or Polars’ streaming mode, allowing users to:
- Fetch batches progressively from the database driver
- Process or forward each batch without holding all in memory