-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
node-postgres handles server disconnection differently on macOS and Linux #1942
Comments
The pool removes clients pretty aggressively (e.g. even when the query error is an error in the query). Are you using |
We're using |
It looks like a pg-promise bug to me. https://github.com/vitaly-t/pg-promise/blob/3d2c670af42e9e1d93d4dc23095061550b40a7c3/lib/database.js#L333 The connection is returned to the pool even if the query fails because the connection dropped. |
@charmander Could you, please point anywhere in documentation or otherwise, that a dropped connection object cannot be returned back to the pool? It's just this is the first time I hear something like that. |
@vitaly-t It's in the docs: https://node-postgres.com/api/pool#releaseCallback The API has been like that for as long as I can remember. |
@sehrope What is in the docks about not allowing to release a dropped connection back to the pool? I don't see anything. |
The pool is not aware of the connection status. It's up to the user to inform the pool that the connection is either still valid (via From the docs:
Using A "smarter" wrapper might differentiate between transient errors (ex: UNIQUE key violation) or permanent ones (ex: connection killed), but a simple default is to design applications with queries that do not fail and evict connections from the pool when they do. |
@sehrope Thanks! I may need to make some changes in pg-promise now :) |
Hi, I also experienced the same issue some time back.
EDIT: @sehrope I found that clients couldn’t be reused even after a UNIQUE key violation, but perhaps that was just something wrong with my setup. I’m afraid I can’t confirm it right now (away from computer). Best, |
+1
A client should be reusable after running an errant command. If you're within a transaction you'll need to issue a If you encounter an error where that is not the case please provide an example as that would be a bug. |
This is likely due to being within a transaction. You'll need to either:
|
@sehrope thank you. It wasn’t in a transaction though; I’ll try to reproduce it in a minimal setup when I get back to computer. |
Here's what troubles me... this documentation implies that the client is supposed to be able to identify a dropped connection from all other types of errors, if it is to pass a truthy value into the release function. But this is far from trivial, especially when executing a transaction, there are so many different types of failures that may occur, and how to filter them out from what actually represents an issue in communication with the server. Some time ago I had to write a complex function to detect just that: Which I still use to check whether or not to attempt executing And this is far from a friendly approach that can be considered for the client-side code. My understanding thus far, the driver must be able to identify connectivity issues on its own, and mark the client as disconnected, because it is too complex to expect the client to do so. And this is the piece of information that's totally missing in documentation here. So how should this be approached? |
A while back I proposed brianc/node-pg-pool#119, which we use internally. This won’t catch everything, but at least handle those cases more nicely. Obviously those internal properties shouldn’t be accessed like that, so the client would need an external API for that. |
@johanneswuerbach If it does throw away broken connections from the pool in a timely fashion, then it's all we need. And as per my previous post, I fully support that broken connections must be handled between the driver and the pool, and not not by the client. The client should only receive a notification when the connection dies, as it does now (in part), and that's it. |
@vitaly-t wrote:
The simplest approach is to treat any error as non-recoverable and instruct the pool to evict the connection. That will always be correct though it could lead to trashing your pool if errors are common enough in normal routes. For example if you try to create a resource and get a duplicate key error. A more advanced approach would be to default to discarding on errors, but if you're wrapped in a transaction then attempt a ROLLBACK. If the ROLLBACK succeeds then consider the connection salvaged. Otherwise, discard it. This will handle the previous thrashing situation yet still evict the connection if it is totally broken (ex: TCP error).
I wouldn't bother with anything general like that. In theory you could check for a FATAL error from the server or something similar from the internal driver, but much safer and easier to default to evicting and not try to be too clever.
You always need a way for the client to inform the driver that the connection should be evicted, even if the connection is still usable. Say you had a route that alters the session in some way that is not recoverable (some
The failures could happen at any time so the end user must always deal with errors directly. Even a "test after checkout from pool" could fail between the test and the user's actual DB work. It might work the majority of the time but there's a race condition if it's killed in between the test completing and the user performing the actual work. |
brianc/node-pg-pool#119 throws away the connections when the library user calls client.release and the pg client is not in a queryable state. The queryable state is determined by the driver itself and should already be a good source of truth. Non-queryable clients directly return errors if you try to call client.query already today, so some of that logic is already out there and tested. The only thing not detectable so far is returning a client in an open and failed, but not rolled back transaction as that condition is currently not detect by the driver. |
If we are not clever enough, we may dispense with alive connection while inside a transaction by mistake. When in failure inside a transaction, and before executing Otherwise, if we just start throwing away live connections, we will end up with transactions locked on the server, without a good reason. |
@johanneswuerbach Are you still using |
If you're the one handling transaction management then you know if you're in a transaction and should attempt a ROLLBACK for a failed unit of work. If you're not the one handling transaction management then you're not handling transaction management and should not attempt anything. Are you trying to infer if the user of your library failed a transaction outside of a library-level helper and trying to recover from it?
If you're the one managing the transaction then just try to do the ROLLBACK. It either works (connection is good) or it doesn't work (connection is definitely no good). It's even possible that the connection is in a workable state (i.e. TCP is fine, DB is still open) when you "verify" that a ROLLBACK is possible and it gets killed before you actually ROLLBACK. Best to just try it and see if it worked.
The PostgreSQL server will ROLLBACK any open transactions for closed connections. Otherwise you could permanently lock up any server by running: BEGIN;
SELECT * FROM some_table FOR UPDATE;
-- Exit the connection... |
It is not that simple. See the original issue, which forced me to start using that error verification. Presumably, it is now fixed. I'm gonna do some good re-testing of this again. And if it is indeed fixed, it does simplify things a bit. Previously, we could often end up with the driver's I will get back on this after some more testing. |
@vitaly-t Okay that makes sense as a workaround but only because not bubbling up the connection error was a bug. Assuming that's fixed you shouldn't have to do anything like that and blindly issuing a ROLLBACK on error should work. |
@vitaly-t no, I'm still using pg-promise and happy to help. Not sure how much time I've, but I can definitely test any changes against our internal test suite, which exercises various failover scenarios. |
@jmacmahon At this point I suggest that you follow it up here instead ;) |
@sehrope If We're seeing a similar issue with |
The default would be to hang indefinitely as that's the default behavior for TCP sockets without explicit timeouts or keep alives. You can set keep alive options on the client config and they'll be passed through to the underlying socket. If that gets triggered it would eventually lead to the socket being marked as dead when the keep alives do not get a response: node-postgres/packages/pg/lib/client.js Lines 51 to 52 in 3c48f22
You can also fully customize the creation of the socket by passing in a function as the node-postgres/packages/pg/lib/connection.js Lines 19 to 21 in 3c48f22
That lets you set any additional TCP options you'd like or even replace the socket with something else entirely (e.g., that's how the Cloudflare streams work). The default creates a plain socket with no additional options: node-postgres/packages/pg/lib/stream.js Lines 21 to 24 in 3c48f22
You could customize that to perform the same steps but additional set a timeout: https://nodejs.org/api/net.html#socketsettimeouttimeout-callback |
Hello, the company I work for has been using
pg-promise
for our database service and we've run into an issue with failover, which we believe is an error in the underlyingnode-postgres
library. We believe the issue is due to the way Linux handles socket timeout events differently to macOS.Steps to reproduce:
Note: we believe this scenario is not specific to RDS, but rather any network outage or server failure which does not send a TCP FIN packet.
Expected outcome and actual outcome on macOS:
Actual outcome on Linux:
Detailed order of events
macOS
Linux
The text was updated successfully, but these errors were encountered: