Skip to content

Conversation

@cnanakos
Copy link

@cnanakos cnanakos commented Oct 9, 2025

fix(discovery): prevent premature node eviction from routing table

The replaceNode function removes nodes if either the replacement cache
has entries or the node's reliability is below the chosen threshold.

With the previous default of 1.0, nodes were being removed even without
replacement candidates available. This happened because reliability
tracking typically keeps values below 1.0, causing the threshold condition
to trigger. Changed the default to 0.0, ensuring nodes are only removed when
proper replacements exist, which honors Kademlia's approach handling transient
network issues.

Evidence from logs show nodes with excellent reliability being removed:

DBG - Node added to routing table topics="discv5 routingtable" tid=1 n=1ff7a561e:10.244.0.208:6890
DBG - bucket topics="discv5" tid=1 depth=0 len=2 standby=0
DBG - node topics="discv5" tid=1 n=130
db8a1b:10.244.2.207:6890 rttMin=1 rttAvg=2 reliability=1.0
DBG - node topics="discv5" tid=1 n=1ff7a561e:10.244.0.208:6890 rttMin=1 rttAvg=14 reliability=1.0
DBG - Node removed from routing table topics="discv5 routingtable" tid=1 n=1ff
7a561e:10.244.0.208:6890
DBG - Total nodes in discv5 routing table topics="discv5" tid=1 total=1
DBG - bucket topics="discv5" tid=1 depth=0 len=1 standby=0
DBG - node topics="discv5" tid=1 n=130db8a1b:10.244.2.207:6890 rttMin=1 rttAvg=165 reliability=0.957
DBG - Node removed from routing table topics="discv5 routingtable" tid=1 n=130
db8a1b:10.244.2.207:6890
DBG - Total nodes in discv5 routing table topics="discv5" tid=1 total=0

First entry shows a node with perfect reliability (1.0) and 14ms RTT
being removed. Second one shows a node with 95.7% reliability and
minimal RTT also being evicted. Both far exceed 0.5 threshold set by NoreplyRemoveThreshold.

@cnanakos cnanakos force-pushed the fix-discovery-eviction branch from 7c5ce8c to 142cfe2 Compare October 10, 2025 16:51
The replaceNode function removes nodes if either the replacement cache
has entries or the node's reliability is below the chosen threshold.

With the previous default of 1.0, nodes were being removed even without
replacement candidates available. This happened because reliability
tracking typically keeps values below 1.0, causing the threshold condition
to trigger. Changed the default to 0.0, ensuring nodes are only removed when
proper replacements exist, which honors Kademlia's approach handling transient
network issues.

Evidence from logs show nodes with excellent reliability being removed:

DBG - Node added to routing table           topics="discv5 routingtable" tid=1 n=1ff*7a561e:10.244.0.208:6890
DBG - bucket                                topics="discv5" tid=1 depth=0 len=2 standby=0
DBG - node                                  topics="discv5" tid=1 n=130*db8a1b:10.244.2.207:6890 rttMin=1 rttAvg=2 reliability=1.0
DBG - node                                  topics="discv5" tid=1 n=1ff*7a561e:10.244.0.208:6890 rttMin=1 rttAvg=14 reliability=1.0
DBG - Node removed from routing table       topics="discv5 routingtable" tid=1 n=1ff*7a561e:10.244.0.208:6890
DBG - Total nodes in discv5 routing table   topics="discv5" tid=1 total=1
DBG - bucket                                topics="discv5" tid=1 depth=0 len=1 standby=0
DBG - node                                  topics="discv5" tid=1 n=130*db8a1b:10.244.2.207:6890 rttMin=1 rttAvg=165 reliability=0.957
DBG - Node removed from routing table       topics="discv5 routingtable" tid=1 n=130*db8a1b:10.244.2.207:6890
DBG - Total nodes in discv5 routing table   topics="discv5" tid=1 total=0

First entry shows a node with perfect reliability (1.0) and 14ms RTT
being removed. Second one shows a node with 95.7% reliability and
minimal RTT also being evicted. Both far exceed 0.5 threshold set by
NoreplyRemoveThreshold.

Signed-off-by: Chrysostomos Nanakos <[email protected]>
@cnanakos cnanakos force-pushed the fix-discovery-eviction branch from 142cfe2 to 5199e37 Compare October 13, 2025 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant