Hacker News new | ask | show | jobs
by purpleturtle22 1029 days ago
Can someone ELI5 the difference between using AWS availability zone affinity and then simply dropping the downed AZ at the top most routing point?

Wouldn't that be the same thing, with the obvious caveat you are t using the routing technology Slack is using (We don't - We use vanilla AWS offerings)

5 comments

They decided to use every routing tool available at least once in their setup, so they can't do this. But there is no explanation in the blog about why they use so many platforms and so many routing tools. Sounds to me like they got themselves into a mess and decided to continue on that path.
Somewhere, an engineering “leader” is going to point to this blog post and then say, “Well, that’s how Slack did it!” and promptly copy this overwrought system
I’m not sure if you’re being serious, but in any case; This will happen, as it always does, inevitably.
Warning statement becomes the howto guide.
Cells are not about guarding against AZ failure, but about partitioning the production infra to protect against bad deploys and configuration changes. Every AZ is split into many different cells.
So, guarding against human errors / process failures, and not hardware failures?
Isn’t that exactly what they are doing? Keeping requests within an AZ and instead of using DNS at the first hop into AZ, they use envoy to control traffic shaping and making that initial decision if traffic needs to be routed away.
You're doing it right.
Isn’t that exactly what they are doing? Keeping requests within an AZ and using global DNS at the first hop into AZ.