Mini-repo to showcase an undesirable behavior related to pgx/pgxconn and the possibility of forever-lodged DB connections in servers connection pools. Discussion at jackc/pgx#2290.
To bring up the db
make db
Then, insert some dummy data
make seed
Finally, bring up the server
make server
Query the normal count endpoint, observe it's working and fast:
$ time curl localhost:8888/count_rows
Total users: 4
curl localhost:8888/count_rows 0.00s user 0.00s system 53% cpu 0.014 total
Query the fetch endpoint without delay, observe it's working and fast as well:
$ time curl localhost:8888/fetch_rows
1: Alice
2: Bob
3: Charlie
4: David
curl localhost:8888/fetch_rows 0.00s user 0.00s system 52% cpu 0.011 total
Query the fetch_rows endpoint with an artificial delay, which simulates a slow query or a service call that takes a long time:
$ time curl "localhost:8888/fetch_rows?delay=true"
Request timed out
curl "localhost:8888/fetch_rows?delay=true" 0.00s user 0.00s system 0% cpu 2.012 total
Now observe that both fetch_rows (with no artificial delay) and count_rows are timing out:
$ time curl localhost:8888/fetch_rows
Request timed out
curl localhost:8888/fetch_rows 0.00s user 0.00s system 0% cpu 2.014 total
$ time curl localhost:8888/count_rows
Request timed out
curl localhost:8888/count_rows 0.00s user 0.00s system 0% cpu 2.011 total
This should surprise you!
This is because our DB pool size is 1, and the single connection in that pool is still busy with data which isn't read yet (rows not closed correctly). As a result the connection isn't available, requests block on acquiring a DB connection, eventually fail to do so, and time out.
The obvious solution is to write correct code and always remember to insert defer rows.Close() after querying rows. This will ensure the connection is released and can be reused by subsequent requests.
The question is: can we do better? What if this mistake is made in the wild? I'd love to find a more systematic solution to either detect or mitigate this class of mistakes. Can we ensure connections make it back to the pool after N seconds for example? Currently connections which are "lodged" like above will never become available again on their. The only solution is a restart of the server.
At scale, this error causes the entire server to become unresponsive. All traffic which requires DB interaction grinds to a halt once a DB pool is depleted.