Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recover from idle_in_transaction_session_timeout #680

Closed
tmtron opened this issue Dec 6, 2019 · 27 comments
Closed

Recover from idle_in_transaction_session_timeout #680

tmtron opened this issue Dec 6, 2019 · 27 comments

Comments

@tmtron
Copy link

tmtron commented Dec 6, 2019

Expected behavior

After an idle_in_transaction_session_timeout, the used connection will be working properly when used again from the pool.

Actual behavior

After a statement timeout in transaction, the used connection will be "broken" and new queries will not work. The error-message is Client has encountered a connection error and is not queryable

Steps to reproduce

I've added a test-case to my project fork

Environment

  • Version of pg-promise: 10.2.1
  • OS type (Linux/Windows/Mac): Windows 10
  • Version of Node.js: 12.12.0

Related

Notes

I thought that adding another check to isConnectivityError for code 25P03 (see postgres error codes) should work, but it does not. In this case the err parameter has no code property, but a message string.
A nasty workaround for now is to check this message string:

    if (!code && err.message && typeof err.message === 'string') {
        if (err.message.includes('is not queryable')) return true;
    }

@vitaly-t
Copy link
Owner

This is from invalid transaction usage, as the author indicated so himself.

@tmtron
Copy link
Author

tmtron commented Dec 11, 2019

I don't see why this issue is closed: IMHO it is clear that pg-postgres is currently not handling the idle_in_transaction_session_timeout error correctly: i.e. it must be treated as a connectivity error and the "broken" connection must not be reused. I hope we agree on that.

So maybe you can give me some thints on how to get the error-code (and not just the error-text) in the isConnectivityError, I'll try to come up with a PR.

@vitaly-t
Copy link
Owner

vitaly-t commented Dec 11, 2019

I thought the issue was sorted.

Just so, this library does not handle idle_in_transaction_session_timeout. It is a lower level for node-postgres perhaps.

@tmtron
Copy link
Author

tmtron commented Dec 11, 2019

Well, the specific issue in our app is fixed.

But I think it's just too easy that such an issue can slip into the application code again in the future. And then it would be great to have the idle_in_transaction_session_timeout as a safety-net so that the app can recover automatically. Then the app would still be usable until we find the issue in the code and install the new version.

I'll check node-postgres and their issue-list when I find some time.

@vitaly-t
Copy link
Owner

vitaly-t commented Dec 11, 2019

This is an unusual issue, and is the first time I ever hear about it.

And if you can detect it for a specific transaction code, you can force the physical connection to be released, using the following synthetic approach:

async function specialTransaction() {
    const obj = await db.connect(); // manually allocating the connection
    let idleTxIssueFound = false;
    let txData;
    try {
            txData = await obj.tx(async t => {
                 // execute the transaction, and detect the idle issue, if possible
                 idleTxIssueFound = true; // if found an issue
            });
    } catch(e) {
      } finally() {
          // passing true into method `done` will force physical release of the connection
           obj.done(idleTxIssueFound);
           return txData;
        }
} 

This is more of a proof of concept though.

@tmtron
Copy link
Author

tmtron commented Dec 12, 2019

After checking some code and issues of node-postgres, I think my initial assumption was wrong. We will never receive error-code 25P03 from the database. Instead we only get the error-string from node-postgres::

query.handleError(new Error('Client has encountered a connection error and is not queryable'), this.connection)

I think whenever this error happens (for whatever reason - not only when idle_in_transaction_session_timeout is used), pg-promise must remove this connection from the pool because it is broken.

Since node-postgres only returns an error string, and no error-code or error-class that can be checken, the only way to handle this case is to check the text of the error-message.

There are many related issues in node-postgres:

and in node-pg-pool:

@vitaly-t
Copy link
Owner

It's a tough one to tackle. I mean, there is not good fix from this end, by the sound of it.

@tmtron
Copy link
Author

tmtron commented Dec 12, 2019

I can also make the test work without checking the error-string and use client._queryable instead

e.g. change line 59 of function poolConnect from
client.release(kill || client.$connectionError);
to :
client.release(kill || client.$connectionError || !client._queryable);

That is for sure better, but maybe the underscore in the variable name _queryable indicates that this variable is private and should not be used?

@vitaly-t
Copy link
Owner

This does look like a hack-fix, i.e. should work in the current implementation of the client, but may get broken later, as we use a private variable.

@tmtron
Copy link
Author

tmtron commented Dec 12, 2019

That's what I think too: maybe the node-postgres guys will make this pubilc: see node-postgres #2026

@vitaly-t
Copy link
Owner

vitaly-t commented Dec 12, 2019

I think a more appropriate request would be to make the pool automatically release non-queryiable clients. Asking to make a private variable into public isn't a good idea in this case, it's just too low-level for what you are trying to achieve with this.

I will think a little more about it, and perhaps will add that hack-fix, for now.

@vitaly-t
Copy link
Owner

vitaly-t commented Dec 12, 2019

@tmtron Before I make a change, since I cannot reproduce the issue on my side, can you check something for me, please...

When this issue occurs, does the query throw an error? Because if it does, then the best place to amend the code is within isConnectivityError function. And if the query does not throw an error, then I will have to amend the connect method.

@tmtron
Copy link
Author

tmtron commented Dec 13, 2019

since I cannot reproduce the issue on my side

Really? So the reproduction test-case works on your side?
That's strange: the test case can reproduce the issue on my dev-PC and on GitLab CI.

When this issue occurs, does the query throw an error? Because if it does, then the best place to amend the code is within isConnectivityError function.

The first query that is run after the timeout will throw the error string Client has encountered a connection error and is not queryable.
As mentioned in the original post, we could handle this in the isConnectivityError function, but since we only get a string it's quite nasty.

The current version of pg-promise will not recover from this and also the 2nd query in the test will throw the same error-message (because the broken connection, which is not queryable anymore, is reused).

@vitaly-t
Copy link
Owner

vitaly-t commented Jan 12, 2020

@tmtron Sorry for the late response. I can indeed see the issue from the test you provided.

Here I simplified it, enough to still see the issue.

I'm not sure about the proper fix yet. Will get back to it.

@vitaly-t
Copy link
Owner

vitaly-t commented Jan 12, 2020

For now, I have created a branch tx-idle-timeout to experiment with this.

@tmtron If you have a suggestion, please do follow up ;)

@tmtron
Copy link
Author

tmtron commented Jan 12, 2020

@vitaly-t Thanks for looking into this. Unfortunately I have no idea why the broken connection is not removed and reused in the next query.
Just note, that this is not urgent for me, since we can live for now with the quick-hacky fix using client._queryable as explained in this comment above.

vitaly-t added a commit that referenced this issue Jan 12, 2020
@vitaly-t
Copy link
Owner

vitaly-t commented Jan 12, 2020

@tmtron I think I have solved it with this commit, to force killing every connection that's non-queryable.

Can you give it a go to confirm the fix (it is now in the master branch)?

Wow, noticed just now your comment above, as we published almost at the same time :) So it looks like in the end we came to the same conclusion 😄

I will make a new release shortly.

@vitaly-t
Copy link
Owner

Released the fix in v10.3.5.

@vitaly-t vitaly-t added the fixed label Jan 12, 2020
@tmtron
Copy link
Author

tmtron commented Jan 12, 2020

@vitaly-t I've just updated to 10.3.5 and it works for me. All our tests are okay.
I think using the "private" _queryable property of postgres-node is okay for now, since we have a test-case that will fail, if this property will ever change.
And we will keep an eye on node-postgres #2026
Thanks again!

@tmtron tmtron closed this as completed Jan 12, 2020
@zhanghuancs
Copy link

zhanghuancs commented Jan 29, 2020

Hi, I upgraded to 10.3.5, and now I did not see error message Client has encountered a connection error and is not queryable any more, however, it starts to throw error: Connection terminated unexpectedly. I'm pretty new to pg-promise. Could anyone help on this issue? Should I open it as a new issue? Thank you very much.

The stacktrace I have in my logs looks like

Error: Connection terminated unexpectedly
    at u.<anonymous> (/var/task/webpack:/node_modules/pg/lib/client.js:252:9)
    at Object.onceWrapper (events.js:299:28)
    at u.emit (events.js:210:5)
    at u.emit (domain.js:476:20)
    at Socket.<anonymous> (/var/task/webpack:/node_modules/pg/lib/connection.js:76:10)
    at Socket.emit (events.js:210:5)
    at Socket.EventEmitter.emit (domain.js:476:20)
    at TCP.<anonymous> (net.js:658:12)

Environmen

  • Version of pg-promise: 10.3.5
  • OS type (Linux/Windows/Mac): Linux
  • Version of Node.js: 12.12.24

@vitaly-t
Copy link
Owner

@zhanghuancs That sounds like an issue related to keeping a connection open, which might something to do with your hosting environment. This is not related to the original issue here.

@vitaly-t
Copy link
Owner

vitaly-t commented Jan 29, 2020

@tmtron The underlying driver just took in the recommended change for dumping non-queryable connections.

This means the functionality added here will be removed in a new official release that starts using that branch. That probably will be version 11.

@zhanghuancs
Copy link

@zhanghuancs That sounds like an issue related to keeping a connection open, which might something to do with your hosting environment. This is not related to the original issue here.

Hi, @vitaly-t Thank you for the response. I'm using AWS Lambda and follow the instruction here to reuse database connection using AWS Lambda. This might be the issue related to keeping a connection open as you said. Is there a way in pgpromise that I can get the open connection and kill this specific connection? Besides, is it possible that this issue is related to this thread, and it's quoted that the connection errors (and query errors, from my findings) should be passed to client.release. If I catch this error, and how can I release the connection? In my API, I just use .task to chain .oneOrNone and .any queries, very similar to the [transaction example] (https://github.com/vitaly-t/pg-promise/blob/master/examples/transaction.js), but I did not execute .finally(db.$pool.end); in the end, as I do not want to exit the app.

Any help would be appreciated! Thanks.

@Gustav0ar
Copy link

Gustav0ar commented Mar 31, 2020

@vitaly-t Hi, I think there is a regression in this item. I'm importing a big amount of data with a complex query from one remote server to another using this method: https://github.com/vitaly-t/pg-promise/wiki/Data-Imports#massive-inserts with, v10.4.4 I'm getting this error: Client has encountered a connection error and is not queryable . When I rollback to the version that you fixed v10.3.5, the query executes without issues

@vitaly-t
Copy link
Owner

vitaly-t commented Mar 31, 2020

@Gustav0ar Could you, please, re-try, using the latest Beta version?

I am currently updating it, before v10.5.0 release.

@Gustav0ar
Copy link

@Gustav0ar Could you, please, re-try, using the latest Beta version?

I am currently updating it, before v10.5.0 release.

@vitaly-t Apparently it is working fine with the beta version. I'll update if I find any issues

@flovilmart
Copy link

@vitaly-t still seeing the issue on 10.11.0 Anything we can do to help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants