-
Notifications
You must be signed in to change notification settings - Fork 70
Description
What happens?
We started seeing duplicated rows when extracting data from PostgresDB and noticed that it happens when the row is updated/created when different transaction starts, which seems to be similar to what was reported back in #163.
I checked our logs with the flag pg_debug_show_queries=true after I saw this change and validated all connections are using BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ READ ONLY as expected.
However, I noticed SET TRANSACTION SNAPSHOT 'some-snapshot-id' is never called and neither the expected query before that: SELECT pg_is_in_recovery(), pg_export_snapshot(), (select count(*) from pg_stat_wal_receiver).
This query works when executed directly in our database and provides the following result:
SELECT pg_is_in_recovery(), pg_export_snapshot(), (select count(*) from pg_stat_wal_receiver);
pg_is_in_recovery | pg_export_snapshot | count
-------------------+---------------------+-------
f | 0000102C-00400680-1 | 0
I suspect the code is not reaching the part where it gets pg_export_snapshot for some reason and therefore affecting the results of the reading operation even when the isolation level is correct. Our workflow has pg_connection_limit=4.
To Reproduce
A bit hard to reproduce since the update operation of a row needs to happen between the start of concurrent transactions (we see this issue happening a couple of times per week), but we're happy to test some code in our workflows.
OS:
Linux
PostgreSQL Version:
17.5
DuckDB Version:
1.4.1
DuckDB Client:
Python
Full Name:
Rina Sakurai
Affiliation:
Mentimeter
Have you tried this on the latest main branch?
- I agree
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
- I agree