-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Openethereum node stopped importing blocks #11737
Comments
Node has hung again:
|
I now have an strace of it hanging and a gdb thread dump when it has hung... |
To complement @CrispinFlowerday:
|
Oh, that sounds like something that'd be plenty useful. Can you share them? |
This issue is here since 2.7.2 https://github.com/openethereum/openethereum/issues/11539 and https://github.com/openethereum/openethereum/issues/11494 Is there a way to downgrade again? |
I'm experiencing similar issues on the classic chain after upgrading from 2.7.2 to 3.0.1 to tackle #11744 . Now its stuck on a block and won't sync past it. |
(Copied from #11494 since this seems more relevant) For me, 2.7.2 completely stops syncing every few days. The server still runs and RPC calls are still answered, but it just doesn't get any new blocks. Even though it shows 10+ connections. I have full debug logs of a few instances, but they don't seem to say anything interesting. I think it got better when I enabled |
Agreed we have this same issue - full historical archive node with 8TB SSD etc etc. Hangs and has to be killed and restarted. we are running alternate nodes on 2.5.13 that do not have this issue. |
Here's my kill script to work around this by request from @kcanty: https://gist.github.com/phiresky/153411abb794ad61eb4ba949236bc6ce it's written in JS since it has to connect to the node and does some rudimentary checks. |
Thanks @phiresky - appreciate it - looks good. |
There is definitely something going on since 2.7.2 (database format changes?). 2.7.2 and 3.0.0 either sync too slow to ever catch up to tip, or freeze (stop importing blocks) periodically. Happens on full/fast and archive nodes, both with tracing. 2.5.13 has no issues. |
I've been watching this issue and other issues like it on OpenEthereum, and I'm just wondering if this has any effect on the pending Berlin hard fork. I guess if OpenEthereum nodes can't sync, it doesn't really have a deleterious effect on the chain, but if everyone is forced to upgrade because of the Berlin fork, and a good portion of those newly upgraded nodes can't sync, doesn't that lower the overall security of the system as a whole. Is there any sense of how many OpenEthereum nodes are experiencing this problem? I've not yet upgraded to 2.7.2 because of this issue, but with Berlin approaching, I will have to. |
So... My openethereum node is currently hung - I'm happy to do any debugging on it over the next 48 hours that would be helpful to allow devs to track this issue down:
|
After first and single restart (since synced and consumed 21 gb of virtual memory / 5 res) OpenEthereum/v3.0.1-stable-8ca8089-20200601/ works fine for me for 5 days on ETC (currently 12.5 gb of virtual memory). Parity-Ethereum/v2.6.8-beta-9bf6ed8-20191231/ works fine on ETH, but it can stop syncing, if resident memory ends up. It is very old issue, came from eairlier versions. May be last versions have a memory leak, that leads to consuming all available RAM and thus, stopping syncing. |
This is currently a blocker for us to support the upcoming Berlin fork. |
One data point - our (I work with Vogelito) stuck openethereum node did, eventually emit more logs:
I've now restarted it with logging set to |
Our node hung again today, and I have more detailed logging - for reference the block it is reporting is My observations is just that logging apparently stops from the IO Workers - there is a large gap in logging before the logs from my queries for its block number start. No idea if this is helpful at all. |
Looks like #11758. |
Last night, our openethereum node stopped importing blocks. A restart brought it back to life, but this is an issue for us if it can't keep up to date.
Some observations:
We have been running the same configuration on parity (I believe we were running 2.5.13 prior to upgrading). Our openethereum node has been alive for about 10 days - it was created from a snapshot of the parity DB.
Its configuration is:
The text was updated successfully, but these errors were encountered: