|
|
|
|
|
by achiang
2193 days ago
|
|
> (You will notice that people like Google and Cloudflare skillfully respond with only one record with a 5 minute TTL. That is so the behavior of the browser is well defined, but it also eats their entire year of 99.999% uptime with one bad reply. Your systems had better be very reliable if DNS issues can eat a year's worth of error budget.) This chapter in the Google SRE book explains how our load balancing DNS works: https://landing.google.com/sre/sre-book/chapters/load-balanc... Source: my team runs this service |
|
It might be a little more resilient than even a very minimal nginx, but more than that, I think it must give you more control about what happens when a packet is not "answered" after some set amount of time - you write off who should have been the answerer, then resend that same packet to another server. Keep a buffer of packet, scrape them from the buffer when ACK'ed by the answerer, resend them to another answerer if not ACK'ed after some set amount of time.
Am I guessing correctly?
It seems a bit overcomplicated for normal usecases, but adequate for a large scale like google.