pgx/pgconn repro

Mini-repo to showcase an undesirable behavior related to pgx/pgxconn and the possibility of forever-lodged DB connections in servers connection pools. Discussion at jackc/pgx#2290.

Bringing up DB and server

To bring up the db

make db

Then, insert some dummy data

make seed

Finally, bring up the server

make server

Reproduction steps

Query the normal count endpoint, observe it's working and fast:

$ time curl localhost:8888/count_rows
Total users: 4
curl localhost:8888/count_rows  0.00s user 0.00s system 53% cpu 0.014 total

Query the fetch endpoint without delay, observe it's working and fast as well:

$ time curl localhost:8888/fetch_rows                          
1: Alice
2: Bob
3: Charlie
4: David
curl localhost:8888/fetch_rows  0.00s user 0.00s system 52% cpu 0.011 total

Query the fetch_rows endpoint with an artificial delay, which simulates a slow query or a service call that takes a long time:

$ time curl "localhost:8888/fetch_rows?delay=true"             
Request timed out
curl "localhost:8888/fetch_rows?delay=true"  0.00s user 0.00s system 0% cpu 2.012 total

Now observe that both fetch_rows (with no artificial delay) and count_rows are timing out:

$ time curl localhost:8888/fetch_rows
Request timed out
curl localhost:8888/fetch_rows  0.00s user 0.00s system 0% cpu 2.014 total

$ time curl localhost:8888/count_rows
Request timed out
curl localhost:8888/count_rows  0.00s user 0.00s system 0% cpu 2.011 total

This should surprise you!

This is because our DB pool size is 1, and the single connection in that pool is still busy with data which isn't read yet (rows not closed correctly). As a result the connection isn't available, requests block on acquiring a DB connection, eventually fail to do so, and time out.

Solution

The obvious solution is to write correct code and always remember to insert defer rows.Close() after querying rows. This will ensure the connection is released and can be reused by subsequent requests.

The question is: can we do better? What if this mistake is made in the wild? I'd love to find a more systematic solution to either detect or mitigate this class of mistakes. Can we ensure connections make it back to the pool after N seconds for example? Currently connections which are "lodged" like above will never become available again on their. The only solution is a restart of the server.

At scale, this error causes the entire server to become unresponsive. All traffic which requires DB interaction grinds to a halt once a DB pool is depleted.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pgx/pgconn repro

Bringing up DB and server

Reproduction steps

Solution

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pgx/pgconn repro

Bringing up DB and server

Reproduction steps

Solution

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages