-
Notifications
You must be signed in to change notification settings - Fork 501
Description
-
Version:
libp2p: 2.10.0 -
Platform:
-
Subsystem:
ReconnectQueue
Severity:
Critical
Description:
I have a pretty simple test where I am dialing a node, and waiting for my dialing node to confirm they share the same protocol (handler) and closing the connection at some point,, and then I do it all over again.
I can do this 2 times until my remote node no longer accept new incoming dials. And logging the stats of it I see
connections total=0 inbound=0 outbound=0 | dialQueue pending=2
(it stays like this for more than 24 hours)
Steps to reproduce the error:
this.queue.add(async (options) => {
await pRetry(async (attempt) => {
if (!this.started) {
return
}
try {
await this.connectionManager.openConnection(peerId, {
signal: options?.signal
})
} catch (err) {
this.log('reconnecting to %p attempt %d of %d failed - %e', peerId, attempt, this.retries, err)
throw err
}
}, {
signal: options?.signal,
retries: this.retries,
factor: this.backoffFactor,
minTimeout: this.retryInterval
})
}, {
peerId
})
In this code I have put a log before "this.connectionManager.openConnection"
and it seems to be stuck forever for me in that call. (I hotpatched am logging before this.connectionManager.openConnection and in a finally statement, and I never see the finally statement to be processed.
and I wonder whether
a timeout signal should be passed to
this.connectionManager.openConnection
Another problematic code path thinking about is that if a peer is redialing before a connection is setup, are we closing/aborting the call in process? and so we can restart all over directly, quickly, without having to wait for potential timeout?
I apologize for not given a isolated, reproducible example, but wanted to write this issue quickly to get awarness/help and also make other devs see this too