Hacker News new | ask | show | jobs
by alttab 2887 days ago
"Within Google, we implement periodic downtime in some services to prevent a service from being overly available."

Uh..... what?

6 comments

Services have different relationships with each others in terms of dependencies, and in terms of what you think those dependencies are.

If your idea of how things work is that services A, B, and C can optionally use service D, else use some fallback process, then if D has never failed, then you've never used that fallback process. And services X, Y, and Z which rely on services A, B, and C haven't had to deal with those services using their fallback processes either. So, instead of waiting for D to fail, you can take it down at a convenient time.

This applies to services as a whole, or services within a locality, or all services in some availability zone.

Read the full context of that quote. There's even more in the SRE book.

"Don’t make your system overly reliable if you don’t intend to commit to it to being that reliable"

If a service has exceeded the reliability target for a given time period, you can take it down to basically let users know that this can happen and to not expect more.

You don't want them to get to the point where they are integrating so much with a service (and assuming a higher reliability that you have not promised ) that they end up mad at you when it performs worse, but still as intended, at a later date.

Imagine if in python open('file.txt', 'r') never failed so no one ever bothered to put a try block. To prevent this from happening they purposely have open() fail a couple times.
There’s a particular global system that’s very reliable — Global Chubby — and to keep people from putting it in their serving path they just regularly take it down for like an hour per quarter.
Similar concept to Netflix Chaos Monkey.
If you exceed your SLO, people think your service is more reliable than it is.

When you have your infrequent but expected failures, they are caught by surprise unless you normalize your SLO burn.