Skip to content

Support for streaming query results (Rust API) #854

@harshkumar314e

Description

@harshkumar314e

Summary

Currently, ConnectorX loads the entire query result into memory before returning it as a Vec.
This makes it inefficient or even infeasible for large datasets — especially when used in a long-running server process, as memory usage can easily grow to multiple gigabytes and potentially cause OOM.

I’m using the Rust implementation of ConnectorX in a server context and would like to fetch and propagate results in a streaming fashion instead of materializing everything at once.

Problem

ConnectorX’s design focuses on parallelized, in-memory loading of data.
While this is optimal for batch ETL and data science workflows, it’s not suitable for:

  • Long-running backend services
  • APIs that need to progressively send data to clients
  • Large analytical queries that don’t fit in memory

When dealing with tens or hundreds of millions of rows, the current API (returning a Vec) forces all data to be stored in RAM at once.

Proposed Solution

Introduce a streaming interface to ConnectorX’s Rust API, e.g.:

let mut stream = cx_stream::<PostgresArrowTransport>(&source_conn, &query)?;
while let Some(batch) = stream.next().await? {
    // process or forward each RecordBatch incrementally
}

This could work similarly to Arrow Flight’s Stream or Polars’ streaming mode, allowing users to:

  • Fetch batches progressively from the database driver
  • Process or forward each batch without holding all in memory

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions