Spurious disconnect loop when a channel is stuck #3695

yellowred · 2025-03-31T18:28:54Z

We have a case when a local LDK node disconnects a remote peer (LND) on RAA timeout in order to restore the channel operation and send an alert to upstream. The issue is in our case the disconnect does not achieve the main goal of restoring the channel and continues disconnect, reconnect, re-establish cycle indefinitely.

The root cause was a failure in the remote signer for the local node that cause one channel to be stuck. The remote signer cam online almost immediately and continued to provide signatures to CS/RAA messages, but LDK was unable recover from the failed state of the stuck channel and did not request any new signatures. And because the node was disconnecting the balance was fluctuating causing other services down the stack to be unreasonably busy.

LDK logs (sorted to last first):

(peer_id = 039174f846626c6053ba80f5443d0db33da384f1dde135bf7080ba1eec46501aaa, channel_id = a65623374ac11ac8bfecda19a8dd11607b6c1d18d4c78c1bf4254d9471869bbb): Disconnecting peer 039174f846626c6053ba80f5443d0db33da384f1dde135bf7080ba1eec46501aaa due to not making any progress on channel a65623374ac11ac8bfecda19a8dd11607b6c1d18d4c78c1bf4254d9471869bbb

... a minute before

(peer_id = 039174f846626c6053ba80f5443d0db33da384f1dde135bf7080ba1eec46501aaa, channel_id = a65623374ac11ac8bfecda19a8dd11607b6c1d18d4c78c1bf4254d9471869bbb): Handling channel resumption for channel a65623374ac11ac8bfecda19a8dd11607b6c1d18d4c78c1bf4254d9471869bbb with no RAA, no commitment update, 0 pending forwards, 0 pending update_add_htlcs, not broadcasting funding, without channel ready, without announcement, without tx_signatures

(peer_id = 039174f846626c6053ba80f5443d0db33da384f1dde135bf7080ba1eec46501aaa, channel_id = a65623374ac11ac8bfecda19a8dd11607b6c1d18d4c78c1bf4254d9471869bbb): Generating channel update for channel a65623374ac11ac8bfecda19a8dd11607b6c1d18d4c78c1bf4254d9471869bbb

(peer_id = 039174f846626c6053ba80f5443d0db33da384f1dde135bf7080ba1eec46501aaa, channel_id = a65623374ac11ac8bfecda19a8dd11607b6c1d18d4c78c1bf4254d9471869bbb): Attempting to generate channel update for channel a65623374ac11ac8bfecda19a8dd11607b6c1d18d4c78c1bf4254d9471869bbb

(peer_id = 039174f846626c6053ba80f5443d0db33da384f1dde135bf7080ba1eec46501aaa, channel_id = a65623374ac11ac8bfecda19a8dd11607b6c1d18d4c78c1bf4254d9471869bbb): Reconnected channel a65623374ac11ac8bfecda19a8dd11607b6c1d18d4c78c1bf4254d9471869bbb with lost outbound RAA and lost remote commitment tx, but unable to send due to resend order, waiting on signer for commitment update

Error in the remote node (LND):

ChannelLink(c4918671944d25f41b8cc7d4181d6c7b6011dda819daecbfc81ac14a37235bbb:1): received warning message from peer: chan_id=a65623374ac11ac8bfecda19a8dd11607b6c1d18d4c78c1bf4254d9471869aaa, err=Disconnecting due to timeout awaiting response
ChannelPoint(c4918671944d25f41b8cc7d4181d6c7b6011dda819daecbfc81ac14a37235bbb:1): pending remote commitment: (*lnwallet.commitment)(0x400bb1e480)({

The channel re-establishment works manually, so maybe we should be more proactive in querying the remote signer, instead of just waiting on signer for commitment update.

The text was updated successfully, but these errors were encountered:

alecchendev · 2025-04-02T18:41:15Z

This may be something we can fix on our side, still figuring it out.

TheBlueMatt · 2025-04-08T17:11:22Z

It does seem like something that should be fixed as a part of the async signing logic - if we're stuck waiting for an async signing operation we shouldn't "blame the peer" and disconnect, we should just keep going and maybe log an issue.

TheBlueMatt · 2025-04-09T12:30:16Z

@wpaulino pointed out that even if we don't disconnect in this case, our peer should. Maybe they won't so maybe we can still fix it but in general they should so really this needs to be fixed by not having the async signer stall.

yellowred changed the title ~~Spurious disconnects when can not make progress on a channel~~ Spurious disconnect loop when a channel is stuck Mar 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spurious disconnect loop when a channel is stuck #3695

Spurious disconnect loop when a channel is stuck #3695

yellowred commented Mar 31, 2025

alecchendev commented Apr 2, 2025

TheBlueMatt commented Apr 8, 2025

TheBlueMatt commented Apr 9, 2025

Spurious disconnect loop when a channel is stuck #3695

Spurious disconnect loop when a channel is stuck #3695

Comments

yellowred commented Mar 31, 2025

alecchendev commented Apr 2, 2025

TheBlueMatt commented Apr 8, 2025

TheBlueMatt commented Apr 9, 2025