Open
Description
Hello, the company I work for has been using pg-promise
for our database service and we've run into an issue with failover, which we believe is an error in the underlying node-postgres
library. We believe the issue is due to the way Linux handles socket timeout events differently to macOS.
Steps to reproduce:
- Connect to a Postgres server. We used one hosted in AWS RDS with MultiAZ failover enabled.
- Run a query every N seconds
- Reboot the server such that the connection is dropped without a TCP FIN packet. We did a reboot with failover in AWS RDS.
Note: we believe this scenario is not specific to RDS, but rather any network outage or server failure which does not send a TCP FIN packet.
Expected outcome and actual outcome on macOS:
- The next query fails and the failing client is removed from the pool.
- Subsequent queries use a new client which tries to establish a fresh connection.
- When the server reboot/failover is complete, these queries will succeed.
Actual outcome on Linux:
- The next query fails, but the bad client is not removed from the pool.
- Subsequent queries try to re-use the bad client and fail even after the reboot/failover is complete.
Detailed order of events
macOS
- Successful query
- Reboot DB
- DB stops listening on original IP
- Client begins a further query
- TCP sends query, does not recieve an ACK
- TCP begins retransmission, does not receive an ACK
- Approximately 18 seconds after sending, TCP ceases retransmission and sends a RST
- Client rejects query promise with "Error: read ETIMEDOUT"
- Immediately after the error the "connection" remains in node-pg's pool
- Almost immediately afterwards the pool emits an "error" event
- The "connection" is removed from node-pg's pool
- Client begins another query
- DNS fetches the new IP
- TCP successfully submits and retrieves the query to the new IP
Linux
- Successful query
- Reboot DB
- DB stops listening on original IP
- Client begins a further query
- TCP sends query, does not receive an ACK
- TCP begins retransmission, does not receive an ACK
- Approximately 18 seconds after sending, TCP ceases retransmission and sends a RST
- Client rejects query promise with a "Error: Connection terminated unexpectedly"
error Error: Connection terminated unexpectedly at Connection.con.once (/src/node_modules/pg-promise/node_modules/pg/lib/client.js:235:9) at Object.onceWrapper (events.js:313:30) at emitNone (events.js:106:13) at Connection.emit (events.js:208:7) at Socket.<anonymous> (/src/node_modules/pg-promise/node_modules/pg/lib/connection.js:131:10) at emitNone (events.js:111:20) at Socket.emit (events.js:208:7) at endReadableNT (_stream_readable.js:1056:12) at _combinedTickCallback (internal/process/next_tick.js:138:11) at process._tickDomainCallback (internal/process/next_tick.js:218:9)
- After the error, the "connection" remains in node-pg's pool
- Subsequent queries fail immediately without sending any data with the following error:
error Error: Client has encountered a connection error and is not queryable at process.nextTick (/src/node_modules/pg-promise/node_modules/pg/lib/client.js:500:25) at _combinedTickCallback (internal/process/next_tick.js:131:7) at process._tickDomainCallback (internal/process/next_tick.js:218:9)
Metadata
Metadata
Assignees
Labels
No labels