fix(discovery): prevent premature node eviction from routing table #106
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
fix(discovery): prevent premature node eviction from routing table
The replaceNode function removes nodes if either the replacement cache
has entries or the node's reliability is below the chosen threshold.
With the previous default of 1.0, nodes were being removed even without
replacement candidates available. This happened because reliability
tracking typically keeps values below 1.0, causing the threshold condition
to trigger. Changed the default to 0.0, ensuring nodes are only removed when
proper replacements exist, which honors Kademlia's approach handling transient
network issues.
Evidence from logs show nodes with excellent reliability being removed:
DBG - Node added to routing table topics="discv5 routingtable" tid=1 n=1ff7a561e:10.244.0.208:6890
DBG - bucket topics="discv5" tid=1 depth=0 len=2 standby=0
DBG - node topics="discv5" tid=1 n=130db8a1b:10.244.2.207:6890 rttMin=1 rttAvg=2 reliability=1.0
DBG - node topics="discv5" tid=1 n=1ff7a561e:10.244.0.208:6890 rttMin=1 rttAvg=14 reliability=1.0
DBG - Node removed from routing table topics="discv5 routingtable" tid=1 n=1ff7a561e:10.244.0.208:6890
DBG - Total nodes in discv5 routing table topics="discv5" tid=1 total=1
DBG - bucket topics="discv5" tid=1 depth=0 len=1 standby=0
DBG - node topics="discv5" tid=1 n=130db8a1b:10.244.2.207:6890 rttMin=1 rttAvg=165 reliability=0.957
DBG - Node removed from routing table topics="discv5 routingtable" tid=1 n=130db8a1b:10.244.2.207:6890
DBG - Total nodes in discv5 routing table topics="discv5" tid=1 total=0
First entry shows a node with perfect reliability (1.0) and 14ms RTT
being removed. Second one shows a node with 95.7% reliability and
minimal RTT also being evicted. Both far exceed 0.5 threshold set by NoreplyRemoveThreshold.