Skip to content

Conversation

@noahisaksen
Copy link
Contributor

@noahisaksen noahisaksen commented May 29, 2025

#328

LIMIT pushdown was added to this extension. #313

It pushes LIMIT down to Postgres and removes it from the DuckDB query plan.
Problem is that parallelism can occur in the extension, and one Query gets turned into n postgres queries. Pushing LIMIT down then falsifies the result.

A test is added, using set pg_pages_per_task=1, that reproduces this by forcing parallelism.

I added a check on max_threads, which makes it so that LIMIT is NOT pushed down, if it is anything other than 1.
This should prevent LIMIT from being pushed down in any case where we would have parallel queries from the best of my knowledge. (There could be a better way to check for parallelism that I am not aware of)

Previously

set pg_pages_per_task=1;
FROM s.large_tbl LIMIT 1;

COPY (SELECT "i" FROM "public"."large_tbl" WHERE ctid BETWEEN '(0,0)'::tid AND '(1,0)'::tid LIMIT 1) TO STDOUT (FORMAT "binary");

COPY (SELECT "i" FROM "public"."large_tbl" WHERE ctid BETWEEN '(1,0)'::tid AND '(2,0)'::tid LIMIT 1) TO STDOUT (FORMAT "binary");

COPY (SELECT "i" FROM "public"."large_tbl" WHERE ctid BETWEEN '(2,0)'::tid AND '(3,0)'::tid LIMIT 1) TO STDOUT (FORMAT "binary");

COPY (SELECT "i" FROM "public"."large_tbl" WHERE ctid BETWEEN '(3,0)'::tid AND '(4,0)'::tid LIMIT 1) TO STDOUT (FORMAT "binary");

COPY (SELECT "i" FROM "public"."large_tbl" WHERE ctid BETWEEN '(4,0)'::tid AND '(5,0)'::tid LIMIT 1) TO STDOUT (FORMAT "binary");

┌────────────┐
│     i      │
│   int64    │
├────────────┤
│      70286 │
│      72772 │
│      74128 │
│      75032 │
│      76388 │
│      79100

AFTER (NO LIMIT DURING PARALLEL)

set pg_pages_per_task=1;
FROM s.large_tbl LIMIT 1;

COPY (SELECT "i" FROM "public"."large_tbl" WHERE ctid BETWEEN '(0,0)'::tid AND '(1,0)'::tid) TO STDOUT (FORMAT "binary");

COPY (SELECT "i" FROM "public"."large_tbl" WHERE ctid BETWEEN '(1,0)'::tid AND '(2,0)'::tid) TO STDOUT (FORMAT "binary");

COPY (SELECT "i" FROM "public"."large_tbl" WHERE ctid BETWEEN '(2,0)'::tid AND '(3,0)'::tid) TO STDOUT (FORMAT "binary");

COPY (SELECT "i" FROM "public"."large_tbl" WHERE ctid BETWEEN '(3,0)'::tid AND '(4,0)'::tid) TO STDOUT (FORMAT "binary");

COPY (SELECT "i" FROM "public"."large_tbl" WHERE ctid BETWEEN '(4,0)'::tid AND '(5,0)'::tid) TO STDOUT (FORMAT "binary");

┌───────┐
│   i   │
│ int64 │
├───────┤
│   0   │
└───────┘

@Mytherin
Copy link
Contributor

Mytherin commented Jun 10, 2025

Thanks for the PR! The problem makes sense to me. Perhaps instead of not applying the optimization - we can instead set the threads to 1 when the optimization is applied? i.e. where we have bind_data.limit = ... we also set bind_data.max_threads = 1? That should also fix the issue while still allowing the optimization to be applied.

@Mytherin Mytherin merged commit c0411b9 into duckdb:main Jun 12, 2025
16 checks passed
@Mytherin
Copy link
Contributor

I'll merge this as-is for now, we can pick up enabling the optimization in more scenarios later on

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants