Hacker News new | ask | show | jobs
by vlovich123 1890 days ago
I wonder if a robust consensus algorithm might be a better investment than a timeout. I would imagine there are other bugs in BGP implementations so having a routing table that's going to trend towards eventual consistency regardless of the starting point might be a more robust solution than just focusing on this one corner case. Might be a more intrusive change though & hard to get middleware to roll out such a change?
3 comments

The goal of BGP or other routing protocol isn't consensus though. Each router really just wants to find a next hop for every destination, and there are lots of reasons for differences.

In the case where there are multiple next hops to choose from, it might be nice to have some sort of quality metric to decide, but that's really tricky to measure and integrate. It's really outside the scope of BGP.

You would need to instrument packet loss or transfer speed or something like that by destination path on your application servers (or load balancers), and be able to adjust the proportion of traffic through various paths; keeping in mind that you can't really influence the path beyond your routers or the return path.

It's a lot of work, and I don't think the tools are built for it, but I would love to work on it for someone though. I did something similar for SMS routing, but that's a ton easier, fewer choices, less traffic, clearer success, etc.

There isn't actually a consensus to be formed on the Internet. Communities, local configuration, etc. cause BGP routers to make local decisions about routes to advertise and re-advertise that aren't going be part of a concensus.
Given the size and complexity of the Internet, it might be worth considering making BGP tolerant to Bizantine failures.
There isn't a really useful metric for failure, though. Not every prefix of every AS needs to be reachable from every other. Unlike consensus problems where everyone wants to agree on the same state it's sufficient for the Internet to be in a working state where each AS has enough routes that they care about, and BGP is pretty good at achieving that.