| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by achiang 2193 days ago

> (You will notice that people like Google and Cloudflare skillfully respond with only one record with a 5 minute TTL. That is so the behavior of the browser is well defined, but it also eats their entire year of 99.999% uptime with one bad reply. Your systems had better be very reliable if DNS issues can eat a year's worth of error budget.)

This chapter in the Google SRE book explains how our load balancing DNS works:

https://landing.google.com/sre/sre-book/chapters/load-balanc...

Source: my team runs this service

1 comments

1996 2193 days ago

I skimmed though, not a bad idea- instead of using a reverse proxy, you are basically doing a poor man multicast, by letting many servers answer a request. And instead of rewriting the packets, you encapsulate, which should be lighter and faster.

It might be a little more resilient than even a very minimal nginx, but more than that, I think it must give you more control about what happens when a packet is not "answered" after some set amount of time - you write off who should have been the answerer, then resend that same packet to another server. Keep a buffer of packet, scrape them from the buffer when ACK'ed by the answerer, resend them to another answerer if not ACK'ed after some set amount of time.

Am I guessing correctly?

It seems a bit overcomplicated for normal usecases, but adequate for a large scale like google.

link

achiang 2192 days ago

The design you propose is stateful, and if you read the chapter closely, you can see we spend a lot of effort to make things stateless.

The main thing I wanted to respond to in this thread about a single bad server destroying your yearly SLO is described in the first paragraph in the section on load balancing at the virtual IP address.

link

blueblisters 2192 days ago

Sorry I couldn't find a clear rationale in the link. Why does Google prefer a stateless load balancer? Is it infeasible to maintain state at that scale?

link

1996 2192 days ago

Sorry, I didn't read the document that closely. It was a bit too long.

Overall, virtual IPs are still an interesting solution.

link

jrockway 2192 days ago

You might like the actual paper: https://static.googleusercontent.com/media/research.google.c...

link