Skip to content

adapter: Optimize COPY FROM STDIN, parallelize it, use constant memory#35036

Merged
def- merged 10 commits intoMaterializeInc:mainfrom
def-:pr-optimize-copy-from-stdin
Feb 26, 2026
Merged

adapter: Optimize COPY FROM STDIN, parallelize it, use constant memory#35036
def- merged 10 commits intoMaterializeInc:mainfrom
def-:pr-optimize-copy-from-stdin

Conversation

@def-
Copy link
Copy Markdown
Contributor

@def- def- commented Feb 16, 2026

Tried with 100 million rows in https://github.com/def-/ClickBench/tree/pr-materialize/materialize. Before this PR it took 101 min, would go OoM if the files were not split, and have query errors because of results being too large. With this PR the 100 million row ingestion runs in 4:38 min on my dev server (8 cores), should scale pretty linearly with the number of cores. For reference COPY WITH FREEZE in PostgreSQL takes 20 min.

Example test run of the new benchmark in CI with 10 million rows: https://buildkite.com/materialize/release-qualification/builds/1089#019c68b4-4b3f-424f-b3c4-05518e95f1a0

NAME                                | TYPE            |      THIS       |      OTHER      |  UNIT  | THRESHOLD  |  Regression?  | 'THIS' is
--------------------------------------------------------------------------------------------------------------------------------------------------------
CopyFromStdin                       | wallclock       |           3.599 |         101.877 |   s    |    10%     |      no       | better: 28.3 times faster
CopyFromStdin                       | memory_mz       |        1310.349 |        1197.815 |   MB   |    20%     |      no       | worse:   9.4% more
CopyFromStdin                       | memory_clusterd |          28.458 |          28.534 |   MB   |    50%     |      no       | better:  0.3% less

Run in environmentd spec sheet doesn't scale well in Cloud (should be investigated!), but scales well locally:

cores    local    cloud
    1  178.20s  378.87s
    2   89.73s  170.18s
    4   44.79s  144.72s
    8   25.05s  158.65s
   16   20.72s  119.59s
   32      N/A  117.71s

Fixes: https://github.com/MaterializeInc/database-issues/issues/7674
Fixes: https://github.com/MaterializeInc/database-issues/issues/9978

Motivation:
COPY FROM STDIN has been slow for workload replay and testing with large amounts of data in general, also been annoying in https://github.com/MaterializeInc/database-issues/issues/7674 and recently https://materializeinc.slack.com/archives/C08A62E0751/p1770835109967349

@github-actions
Copy link
Copy Markdown
Contributor

Pre-merge checklist

  • The PR title is descriptive and will make sense in the git log.
  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

@def- def- force-pushed the pr-optimize-copy-from-stdin branch 8 times, most recently from 4e1af8e to 76a31a3 Compare February 17, 2026 11:55
@def- def- changed the title adapter: Optimize COPY FROM STDIN to use constant memory & parallelize it adapter: Optimize COPY FROM STDIN, parallelize it, use constant memory Feb 17, 2026
@def- def- force-pushed the pr-optimize-copy-from-stdin branch 3 times, most recently from d5b109c to 0dd54e9 Compare February 17, 2026 14:29
@def- def- marked this pull request as ready for review February 17, 2026 14:34
@def- def- requested review from a team as code owners February 17, 2026 14:34
@def- def- requested a review from aljoscha February 17, 2026 14:34
@def- def- force-pushed the pr-optimize-copy-from-stdin branch from 0dd54e9 to 862818c Compare February 17, 2026 14:47
@def- def- requested review from SangJunBak and ggevay and removed request for aljoscha February 22, 2026 13:50
Copy link
Copy Markdown
Contributor

@ggevay ggevay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, impressive performance gains! Looks good overall, wrote some comments.

Comment thread src/pgwire/src/protocol.rs Outdated
Comment thread src/pgwire/src/protocol.rs
Comment thread src/persist-client/src/cfg.rs Outdated
Comment thread src/pgwire/src/protocol.rs
Comment thread src/pgwire/src/protocol.rs
@def- def- force-pushed the pr-optimize-copy-from-stdin branch from 862818c to 864ae06 Compare February 24, 2026 16:23
@def- def- force-pushed the pr-optimize-copy-from-stdin branch from 864ae06 to de043d4 Compare February 24, 2026 17:06
@def- def- force-pushed the pr-optimize-copy-from-stdin branch from de043d4 to 803a1d5 Compare February 24, 2026 17:12
@def- def- requested review from DAlperin and ggevay February 24, 2026 22:29
@def- def- marked this pull request as draft February 25, 2026 08:14
@def- def- force-pushed the pr-optimize-copy-from-stdin branch from e0ab7e6 to 195b7d1 Compare February 25, 2026 10:05
@def- def- marked this pull request as ready for review February 25, 2026 10:11
@def- def- force-pushed the pr-optimize-copy-from-stdin branch from 195b7d1 to 4f944a1 Compare February 25, 2026 11:38
Copy link
Copy Markdown
Contributor

@ggevay ggevay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@def- def- merged commit 07e0546 into MaterializeInc:main Feb 26, 2026
131 checks passed
@def- def- deleted the pr-optimize-copy-from-stdin branch February 26, 2026 13:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants