"Due to Facebook stopping announcing their DNS prefix routes through BGP, our and everyone else's DNS resolvers had no way to connect to their nameservers. Consequently, 1.1.1.1, 8.8.8.8, and other major public DNS resolvers started issuing (and caching) SERVFAIL responses.
But that's not all. Now human behavior and application logic kicks in and causes another exponential effect. A tsunami of additional DNS traffic follows.
This happened in part because apps won't accept an error for an answer and start retrying, sometimes aggressively, and in part because end-users also won't take an error for an answer and start reloading the pages, or killing and relaunching their apps, sometimes also aggressively."
Cloudflare has a useful tool for measuring if your ISP is using RPKI.[0] For Facebook, this is the latest I could find for their implementation of BGP.[1][2]
Was banging on about this with some of the people probably here over 20 years ago. Not sure what this issue with FB was as I'm not on nanog anymore, but if it's bgp, it's a short list of likely events, as I foggily remember.
- someone big redistributed their static routes for FB into their announcements to peers.
- someone who has mapped peer filters and their prefix lengths has figured out how to announce smaller prefixes for FB routes and have them propagate.
- someone with enable somewhere in one of the major ASNs (like 701 back in my day etc) is doing a straight forward attack on FB.
- someone inside FB messed with load balancing and prepended a bunch of their routes internally and redistributed the long AS paths themselves and just broke shit with internal routing loops.
I have no idea how people unbefunge routing problems now that you have to coordinate multiple teams on the phone to get anything done instead of just one router guru just logging into everything and fixing it. I would be useless at it now, but this is not a recent problem. If it's still a problem, it will always be a problem.
> While there have been a number of ambitious proposals intended to make BGP more secure, these are hard to implement because they would require every autonomous system to simultaneously update their behavior. Since this would require the coordination of hundreds of thousands of organizations and potentially result in a temporary takedown of the entire Internet, it seems unlikely that any of these major proposals will be put into place anytime soon.
You might be entertained to know that this is exactly what happened when 'the net' switched from NCP to TCP/IP -- there was a 'flag day' and poof! we were henceforth on TCP. So, it can be (successfully) done.
Why can’t they at least start to inform who is advertising what. After say 1 year we would have most if not all … gradually we can build a grey BGP not all white but at least in case if some … wonder. Or any other option. Total trust is so untrustworthy.
They key observations:
"Due to Facebook stopping announcing their DNS prefix routes through BGP, our and everyone else's DNS resolvers had no way to connect to their nameservers. Consequently, 1.1.1.1, 8.8.8.8, and other major public DNS resolvers started issuing (and caching) SERVFAIL responses.
But that's not all. Now human behavior and application logic kicks in and causes another exponential effect. A tsunami of additional DNS traffic follows.
This happened in part because apps won't accept an error for an answer and start retrying, sometimes aggressively, and in part because end-users also won't take an error for an answer and start reloading the pages, or killing and relaunching their apps, sometimes also aggressively."