-
Notifications
You must be signed in to change notification settings - Fork 5
Description
When we receive an UPDATE message, we currently filter out that we're originating (e.g. origin4) in both the nlri and withdrawn fields. This can lead to correctness problems that require administrative action to resolve.
Scenario:
- Peer advertises us a route that we are originating and we discard it.
- We stop advertising that route.
- Us withdrawing our advertisement doesn't cause the peer to choose a different bestpath for that route.
- We no longer have a path for that route from this peer.
If no action is taken after (4), then we are left missing the route that we were previously originating.
Removing an originated route is a pretty normal operation, especially if you're doing initial setup and misconfigure something accidentally.
I believe we should investigate the removal of this originated routes filter, as that is ultimately the source of the potential correctness problem
This missing route could also be addressed by:
A) Sending a route-refresh to the peer when we stop originating routes.
B) Inserting a Local path into the RIB for routes we are originating which is preferred over BGP paths learned from peers during bestpath.
(A) is not a good idea because we don't know how large the peer's RIB will be and we'd have to re-process their entire RIB every time we stop advertising even a single route.
(B) is fine conceptually, as most routing stacks use this approach:
When using a "network" statement to pull routes from other protocols into the BGP topology or creating an aggregate, a locally originated path generally goes into the BGP LOC-RIB that is preferred over any peer-learned path.
If an existing routing table entry doesn't exist, then this generally creates a blackhole route for the originated prefix (in order to prevent routing loops from occurring as a result of missing entries for the component routes).
In the case of maghemite, well, really in the case of dendrite, this routing loop prevention is done without using blackhole routes: packets arriving on a front panel port that enocounter a NAT "miss" are dropped rather than submitted to an LPM lookup in the External routing table (avoiding us becoming a transit router + hairpinning packets sent to us from outside the rack).
All that to say: we don't need local routes for any kind of forwarding (or blackholing) data plane behavior.
This brings us to the crux of the issue: Inter-VPC routing.
Since we don't currently have a native inter-VPC data plane within the rack, then we need to have installed an External route that covers our originated prefix regardless of what mask is used. If we do an exact match check and filtering, then we just make it harder for our peer to advertise us a route we need. We effectively require the peer to advertise us reachability to our own prefix, so long as the mask doesn't match what we're using (e.g. if we send a /24, they have to send us any mask other than a /24 that covers the original IP range).
This will likely need to be evaluated in the context of attaching external subnets to instances, as that will modify the data plane / control plane interactions described above.