Skip to content

node-postgres handles server disconnection differently on macOS and LinuxΒ #1942

Open
@jmacmahon

Description

@jmacmahon

Hello, the company I work for has been using pg-promise for our database service and we've run into an issue with failover, which we believe is an error in the underlying node-postgres library. We believe the issue is due to the way Linux handles socket timeout events differently to macOS.

Steps to reproduce:

  • Connect to a Postgres server. We used one hosted in AWS RDS with MultiAZ failover enabled.
  • Run a query every N seconds
  • Reboot the server such that the connection is dropped without a TCP FIN packet. We did a reboot with failover in AWS RDS.

Note: we believe this scenario is not specific to RDS, but rather any network outage or server failure which does not send a TCP FIN packet.

Expected outcome and actual outcome on macOS:

  • The next query fails and the failing client is removed from the pool.
  • Subsequent queries use a new client which tries to establish a fresh connection.
  • When the server reboot/failover is complete, these queries will succeed.

Actual outcome on Linux:

  • The next query fails, but the bad client is not removed from the pool.
  • Subsequent queries try to re-use the bad client and fail even after the reboot/failover is complete.

Detailed order of events

macOS

  • Successful query
  • Reboot DB
  • DB stops listening on original IP
  • Client begins a further query
  • TCP sends query, does not recieve an ACK
  • TCP begins retransmission, does not receive an ACK
  • Approximately 18 seconds after sending, TCP ceases retransmission and sends a RST
  • Client rejects query promise with "Error: read ETIMEDOUT"
  • Immediately after the error the "connection" remains in node-pg's pool
  • Almost immediately afterwards the pool emits an "error" event
  • The "connection" is removed from node-pg's pool
  • Client begins another query
  • DNS fetches the new IP
  • TCP successfully submits and retrieves the query to the new IP

Linux

  • Successful query
  • Reboot DB
  • DB stops listening on original IP
  • Client begins a further query
  • TCP sends query, does not receive an ACK
  • TCP begins retransmission, does not receive an ACK
  • Approximately 18 seconds after sending, TCP ceases retransmission and sends a RST
  • Client rejects query promise with a "Error: Connection terminated unexpectedly"
    error Error: Connection terminated unexpectedly
      at Connection.con.once (/src/node_modules/pg-promise/node_modules/pg/lib/client.js:235:9)
      at Object.onceWrapper (events.js:313:30)
      at emitNone (events.js:106:13)
      at Connection.emit (events.js:208:7)
      at Socket.<anonymous> (/src/node_modules/pg-promise/node_modules/pg/lib/connection.js:131:10)
      at emitNone (events.js:111:20)
      at Socket.emit (events.js:208:7)
      at endReadableNT (_stream_readable.js:1056:12)
      at _combinedTickCallback (internal/process/next_tick.js:138:11)
      at process._tickDomainCallback (internal/process/next_tick.js:218:9)
    
  • After the error, the "connection" remains in node-pg's pool
  • Subsequent queries fail immediately without sending any data with the following error:
    error Error: Client has encountered a connection error and is not queryable
      at process.nextTick (/src/node_modules/pg-promise/node_modules/pg/lib/client.js:500:25)
      at _combinedTickCallback (internal/process/next_tick.js:131:7)
      at process._tickDomainCallback (internal/process/next_tick.js:218:9)
    

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions