Hacker News new | ask | show | jobs
by toast0 1715 days ago
The didn't revoke all their routes, FWIW, just a lot of them (including the anycast DNS routes)
1 comments

What I don't understand is why, when a route is revoked, if there is no other route announced the routing table gets updated? It seems like either it's a black hole or it still works and there was a BGP error (or the route works but the resources aren't present, so traffic would be dropped). What's the reason for designing the system to revoke routes when no new route is announced?

It strikes me it's like DNS when you get a SERVFAIL, why not try the prior IP address. The similarity in the design here suggests there may be common reasoning??

BGP operates on the principle of using the 'most specific prefix', basically if all else is equal, a route covering fewer IPs is more specific and should be used.

When the announcement is revoked, you fall back to a less specific prefix if present, or your default route.

If you've got a full BGP table, then you tend not to have a useful default route (you should have specific routes for everything) and it might be useful to fallback to the last known value. But many participants have an intentional default route and then get announcements for special traffic --- dropping the announcement would mean to send it on the default route instead. It's hard to know what the right thing to do is, so better to do what you were told by the authority.

DNS is a bit different, but again, the authority told you to use some data and how long to keep it (ttl), if they're not there to tell you a value again later, what else can you do but report an error? Some DNS servers have configurable behavior to continue using old data while fetching new data or when new data is unavailable.

But the expectation is if you can't keep your BGP up and your DNS up, your server probably isn't up either. Note that in this case, bypassing DNS and going to the FB Edge PoPs that were still network available (because of different BGP announcements, that weren't withdrawn) resulted in errors, because they weren't able to connect to the upstream data centers. (Or so it seems)

Withdrawing the last route covering an IP range is a legitimate change, indicating that those IP addresses are no longer in use by the ASN. This needs to be supported so that one ASN can withdraw an IP block and transfer it to another ASN.
Basically the same problem as tag soup. As soon as mistakes don't stop things working you get many more mistakes.