Skip to content

r-n-o/pgx-conn-repro

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pgx/pgconn repro

Mini-repo to showcase an undesirable behavior related to pgx/pgxconn and the possibility of forever-lodged DB connections in servers connection pools. Discussion at jackc/pgx#2290.

Bringing up DB and server

To bring up the db

make db

Then, insert some dummy data

make seed

Finally, bring up the server

make server

Reproduction steps

Query the normal count endpoint, observe it's working and fast:

$ time curl localhost:8888/count_rows
Total users: 4
curl localhost:8888/count_rows  0.00s user 0.00s system 53% cpu 0.014 total

Query the fetch endpoint without delay, observe it's working and fast as well:

$ time curl localhost:8888/fetch_rows                          
1: Alice
2: Bob
3: Charlie
4: David
curl localhost:8888/fetch_rows  0.00s user 0.00s system 52% cpu 0.011 total

Query the fetch_rows endpoint with an artificial delay, which simulates a slow query or a service call that takes a long time:

$ time curl "localhost:8888/fetch_rows?delay=true"             
Request timed out
curl "localhost:8888/fetch_rows?delay=true"  0.00s user 0.00s system 0% cpu 2.012 total

Now observe that both fetch_rows (with no artificial delay) and count_rows are timing out:

$ time curl localhost:8888/fetch_rows
Request timed out
curl localhost:8888/fetch_rows  0.00s user 0.00s system 0% cpu 2.014 total

$ time curl localhost:8888/count_rows
Request timed out
curl localhost:8888/count_rows  0.00s user 0.00s system 0% cpu 2.011 total

This should surprise you!

This is because our DB pool size is 1, and the single connection in that pool is still busy with data which isn't read yet (rows not closed correctly). As a result the connection isn't available, requests block on acquiring a DB connection, eventually fail to do so, and time out.

Solution

The obvious solution is to write correct code and always remember to insert defer rows.Close() after querying rows. This will ensure the connection is released and can be reused by subsequent requests.

The question is: can we do better? What if this mistake is made in the wild? I'd love to find a more systematic solution to either detect or mitigate this class of mistakes. Can we ensure connections make it back to the pool after N seconds for example? Currently connections which are "lodged" like above will never become available again on their. The only solution is a restart of the server.

At scale, this error causes the entire server to become unresponsive. All traffic which requires DB interaction grinds to a halt once a DB pool is depleted.

About

Minimal reproduction of a bug occurring when unconsumed rows lodge DB connections in pgx/pgconn pools

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors