Fix router-backtrack cases in last-hop hints #3586

TheBlueMatt · 2025-02-03T22:59:10Z

This fixes the router fuzzer-identified issue in #3553 which ultimately were because the router was backtracking and overwriting the preferred path to a node after we'd already processed hops away from that node. We fix it both for blinded and non-blinded paths, though in two very different ways (removing a lot of code for the non-blinded case!).

If we have a first-hop channel from a first-hop hint, we'll ignore the fees on it as we won't charge ourselves fees. However, if we have a first-hop channel from the network graph, we should do the same. We do so here, also teeing up a coming commit which will remove much of the custom codepath for first-hop hints and start using this common codepath as well.

These tests are a bit annoying to deal with and ultimately work on almost the same graph subset, so it makes sense to combine their graph layout logic and then call it twice. We do that here, combining them and also cleaning up the possible paths as there actually are paths that the router could select which don't meet the tests requirements.

In a coming commit we'll start calling `add_entries_to_cheapest_to_target_node` without always having a public-graph node entry in order to process last- and first-hops via a common codepath. In order to do so, we always need the `node_counter` for the node, however, and thus we track them in `RouteGraphNode` and pass them through to `add_entries_to_cheapest_to_target_node` here. We also take this opportunity to swap the node preference logic to look at the counters, which is slightly less computational work, though it does require some unrelated test changes.

tnull

Cool, thanks!

Did a first pass.

lightning/src/routing/router.rs

TheBlueMatt · 2025-02-10T20:45:52Z

Addressed the comments and also pushed a bonus commit to clean up the router to return a sane error type instead of LightningError (which should also keep code from blowing up in a few places when we rustfmt soon).

valentinewallace

Looks good!

valentinewallace · 2025-02-11T23:27:33Z

lightning/src/routing/router.rs

@@ -2895,165 +2969,6 @@ where L::Target: Logger {
 				}
 			}
 		}
-		for route in payment_params.payee.unblinded_route_hints().iter()


Great to drop all this code 💯

lightning/src/routing/router.rs

tnull

LGTM, I think. Feel free to squash!

Confirmed the patch fixes the particular fuzz failure, and also ran the router target a bit.

TheBlueMatt · 2025-02-12T14:41:19Z

Pushed comment fixes for @valentinewallace will squash once she's happy.

valentinewallace

Sorry for the delay, on a flight right now. Feel free to squash these nits in!

lightning/src/routing/router.rs

valentinewallace · 2025-02-12T23:01:14Z

lightning/src/routing/router.rs

-									{
-										old_entry.value_contribution_msat = value_contribution_msat;
-									}
+									old_entry.value_contribution_msat = value_contribution_msat;
 									hop_contribution_amt_msat = Some(value_contribution_msat);
 								} else if old_entry.was_processed && new_cost < old_cost {


Should we also use should_replace here?

No, we stick with new_cost < old_cost as should_replace is broader (incorporating equal-cost-but-higher-value replacement that we'd have to adapt the below debug assertions to exempt).

lightning/src/routing/router.rs

This likely only impacts very rare edge cases, but if we have two equal-cost paths, we should likely prefer ones which contribute more value (avoiding cases where we use paths which are amount-limited but equal fee to higher-amount paths) and then paths with fewer hops (which may complete faster). It does make test behavior more robust against router changes, which comes in handy over the coming commits.

When we handle the unblinded last-hop route hints from an invoice, we had a good bit of code dedicated to handling fee propagation through the (potentially) multiple last-hops and connecting them to potentially directly-connected first-hops. This was a good bit of code that was almost never used, and it turns out was also buggy - we could process a route hint with multiple hops, committing to one path through nodes A, B, to C, then process another route hint (or public channel) which changes our best path from B to C, making the A entry invalid. Here we remove the whole maze, utilizing the normal hop-processing logic in `add_entries_to_cheapest_to_target_node` for last-hops as well. It requires tracking which nodes connect to last-hop hints similar to the way we do with `is_first_hop_target` in `PathBuildingHop`, storing the `CandidateRouteHop`s in a new map, and always calling `add_entries_to_cheapest_to_target_node` on the payee node, whether its public or not.

When we do pathfinding with blinded paths, we start each pathfinding iteration by inserting all the blinded paths into our nodes map as last-hops to the destination. As we do that, we check if any of the introduction points happen to be nodes we have direct chanels with, as we want to use the local info for such channels and support finding a path even if that channel is not publicly announced. However, as we iterate the blinded paths, we may find a second blinded path from the same introduction point which we prefer over the first. If this happens, we would already have added info from us over the local channel to that intro point and end up with calculations for the first hop to a blinded path that we no longer prefer. This is ultimately fixed here in two ways: (a) we process the first-hop channels to blinded path introduction points in a separate loop after we've processed all blinded paths, ensuring we only ever consider a channel to the blinded path we will ultimately prefer. (b) In the next commit, we add we add a new tracking bool in `PathBuildingHop` called `best_path_from_hop_selected` which we set when we process a channel backwards from a node, indicating that we've committed to the best path to the node and check when we add a new path to a node. This would have resulted in a much earlier debug-assertion in fuzzing or several tests.

When we process a path backwards from a node during pathfinding, we implicitly commit to the path up to that node. Any changes to the preferred path up to that node will make the newly processed path's state invalid. In the previous few commits we fixed cases for this in last-hop paths (both blinded and unblinded). Here we add assertions to enforce this, tracked in a new bool in `PathBuildingHop`.

The router is a somewhat complicated beast, and though the last few commits removed some code from it, a complicated beast it remains. Thus, having `expect`s in it is somewhat risky, so we take this opportunity to replace some of them with `debug_assert!(false)`s and an `Err`-return.

When we see a channel come into the router as a route-hint, but its for a direct channel of ours, we'd like to ignore the route-hint as we have more information in the first-hop channel info. We do this by matching SCIDs, but only considered outbound SCID aliases. Here we change to consider both outbound SCID aliases and the full channel SCID, which some nodes may use in their invoices.

`LightningError` is an error type for returning errors back to the `PeerHandler` when handling P2P messages. However, it used to be more broadly used, in a way that never made any sense. Here we remove on vestige of this, using a `&'static str` for router errors rather than `LightningError` with a constant `action`.

TheBlueMatt · 2025-02-13T01:49:08Z

Squashed with a few more comment tweaks:

$ git diff-tree -U1 0b3425701 0f3c4d26d
diff --git a/lightning/src/routing/router.rs b/lightning/src/routing/router.rs
index 9e59613ec..8fa9968bc 100644
--- a/lightning/src/routing/router.rs
+++ b/lightning/src/routing/router.rs
@@ -2880,9 +2880,9 @@ where L::Target: Logger {
 		// Step (2).
-		// Add entries for first-hop and last-hop channel hints to `dist` and add the target node
-		// as the best entry via `add_node`.
+		// Add entries for first-hop and last-hop channel hints to `dist` and add the payee node as
+		// the best entry via `add_entry`.
 		// For first- and last-hop hints we need only add dummy entries in `dist` with the relevant
 		// flags set. As we walk the graph in `add_entries_to_cheapest_to_target_node` we'll check
-		// those flags and use the hints.
-		// We then either add the target using `add_entries_to_cheapest_to_target_node` or add the
-		// blinded paths to the target using `add_entry`, filling `targets` and setting us up for
+		// those flags and add the channels described by the hints.
+		// We then either add the payee using `add_entries_to_cheapest_to_target_node` or add the
+		// blinded paths to the payee using `add_entry`, filling `targets` and setting us up for
 		// our graph walk.

TheBlueMatt · 2025-02-21T22:51:23Z

Backported (all but the last commit) in #3613

v0.1.2 - Apr 02, 2025 - "Foolishly Edgy Cases" API Updates =========== * `lightning-invoice` is now re-exported as `lightning::bolt11_invoice` (lightningdevkit#3671). Performance Improvements ======================== * `rapid-gossip-sync` graph parsing is substantially faster, resolving a regression in 0.1 (lightningdevkit#3581). * `NetworkGraph` loading is now substantially faster and does fewer allocations, resulting in a 20% further improvement in `rapid-gossip-sync` loading when initializing from scratch (lightningdevkit#3581). * `ChannelMonitor`s for closed channels are no longer always re-persisted immediately after startup, reducing on-startup I/O burden (lightningdevkit#3619). Bug Fixes ========= * BOLT 11 invoices longer than 1023 bytes long (and up to 7089 bytes) now properly parse (lightningdevkit#3665). * In some cases, when using synchronous persistence with higher latency than the latency to communicate with peers, when receiving an MPP payment with multiple parts received over the same channel, a channel could hang and not make progress, eventually leading to a force-closure due to timed-out HTLCs. This has now been fixed (lightningdevkit#3680). * Some rare cases with multi-hop BOLT 11 route hints or multiple redundant blinded paths could have led to the router creating invalid `Route`s were fixed (lightningdevkit#3586). * Corrected the decay logic in `ProbabilisticScorer`'s historical buckets model. Note that by default historical buckets are only decayed if no new datapoints have been added for a channel for two weeks (lightningdevkit#3562). * `{Channel,Onion}MessageHandler::peer_disconnected` will now be called if a different message handler refused connection by returning an `Err` from its `peer_connected` method (lightningdevkit#3580). * If the counterparty broadcasts a revoked state with pending HTLCs, those will now be claimed with other outputs which we consider to not be vulnerable to pinning attacks if they are not yet claimable by our counterparty, potentially reducing our exposure to pinning attacks (lightningdevkit#3564).

TheBlueMatt added 3 commits February 3, 2025 22:13

TheBlueMatt added the backport 0.1 label Feb 3, 2025

tnull mentioned this pull request Feb 4, 2025

Router: Ensure used liquidity is always limited by hop's htlc_max #3553

Closed

tnull self-requested a review February 4, 2025 11:28

TheBlueMatt added the weekly goal Someone wants to land this this week label Feb 6, 2025

tnull reviewed Feb 10, 2025

View reviewed changes

TheBlueMatt force-pushed the 2025-02-router-fixes branch from eb393ea to c419314 Compare February 10, 2025 20:45

valentinewallace self-requested a review February 11, 2025 17:14

valentinewallace reviewed Feb 12, 2025

View reviewed changes

tnull reviewed Feb 12, 2025

View reviewed changes

TheBlueMatt force-pushed the 2025-02-router-fixes branch from c419314 to 0b34257 Compare February 12, 2025 14:41

valentinewallace reviewed Feb 12, 2025

View reviewed changes

TheBlueMatt added 7 commits February 13, 2025 01:48

TheBlueMatt force-pushed the 2025-02-router-fixes branch from 0b34257 to 0f3c4d2 Compare February 13, 2025 01:48

tnull approved these changes Feb 13, 2025

View reviewed changes

valentinewallace approved these changes Feb 13, 2025

View reviewed changes

valentinewallace merged commit 11d12d1 into lightningdevkit:main Feb 13, 2025
24 of 26 checks passed

TheBlueMatt removed the backport 0.1 label Feb 21, 2025

Fix router-backtrack cases in last-hop hints #3586

Fix router-backtrack cases in last-hop hints #3586

Uh oh!

Conversation

TheBlueMatt commented Feb 3, 2025

Uh oh!

tnull left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TheBlueMatt commented Feb 10, 2025

Uh oh!

valentinewallace left a comment

Choose a reason for hiding this comment

Uh oh!

valentinewallace Feb 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tnull left a comment

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt commented Feb 12, 2025

Uh oh!

valentinewallace left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

valentinewallace Feb 12, 2025

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt Feb 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

TheBlueMatt commented Feb 13, 2025

Uh oh!

Uh oh!

TheBlueMatt commented Feb 21, 2025

Uh oh!

Uh oh!