v4.0: xdp: switch to lpm trie for routing (backport of #11465)#11538
v4.0: xdp: switch to lpm trie for routing (backport of #11465)#11538mergify[bot] wants to merge 1 commit intov4.0from
Conversation
|
Cherry-pick of da1dcd7 has failed: To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally |
* xdp: refactor router Split RouterTables from Router. Make RouteMonitor update RouterTables and then periodically build a Router from a snapshot of RouterTables. This is a preparatory change before introducing a pre built radix tree for routing. * xdp: store Route<T> in router Instead of storing RouteEntry, store a specialized Route<T> where T is the address type - v4 or v6. Implement only v4 for now. This allows moving address family branching at build time instead of lookup time. It also reduces the memory used by each entry making lookup more cache friendly. This change removes ipv6 routing which was only half working anyway. * xdp: add simple (but fast) longest prefix router Switch the router to use a compressed nibble radix tree. The tree is immutable and optimized for lookup speed. It is periodically rebuilt when the routing table changes. * xdp: explicitly use the main routing table only It was already the case that we only supported the main table. This makes it explicit and adds the necessary plumbing to support other tables in the future.
767dea1 to
1c92675
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## v4.0 #11538 +/- ##
=========================================
- Coverage 83.0% 82.9% -0.1%
=========================================
Files 838 839 +1
Lines 316793 317165 +372
=========================================
- Hits 263229 263223 -6
- Misses 53564 53942 +378 🚀 New features to boost your workflow:
|
| } | ||
| atomic_router.store(Arc::new(router)); | ||
| self.last_publish = Instant::now(); | ||
| self.dirty = false; |
There was a problem hiding this comment.
should we conditionally set this false on success only?
There was a problem hiding this comment.
same as below, from_tables essentially never fails.
Even if it failed, from_tables(tables) with the same tables would fail in exactly the same way, unless the tables change, in which case dirty=true.
| .expect("error creating RoutingTables"); | ||
| let router = Router::from_tables(tables.clone()).expect("error creating Router"); |
There was a problem hiding this comment.
this can never fail
inside from_tables we do
router.cached_default_route = match router.default_route() {
Ok(hop) => Some(hop),
Err(RouteError::NoRouteFound(_)) => None,
Err(e) => return Err(io::Error::other(e)),
};
Ok(router)
and default_route() fails only with NoRouteFound.
I'll rework the errors in master so this gets less scary but yeah it's effectively an impossible condition.
The initial routing implementation in agave-xdp assumed a static routing table with only a handful of entries (one per interface + default).
DZ breaks that assumption by installing hundreds of routes (~550 on mnb today). The result is that routing when DZ is on doesn't work at all, the XDP thread is pegged at 100% cpu routing, which leads to skipped blocks and the XDP channel filling up.
This PR fixes the issue by reworking routing and implementing a 4-bit (nibble) trie for lookups.
before
after
This is an automatic backport of pull request #11465 done by [Mergify](https://mergify.com).