| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by purpleturtle22 1029 days ago
	Can someone ELI5 the difference between using AWS availability zone affinity and then simply dropping the downed AZ at the top most routing point? Wouldn't that be the same thing, with the obvious caveat you are t using the routing technology Slack is using (We don't - We use vanilla AWS offerings)

5 comments

t0mas88 1029 days ago

They decided to use every routing tool available at least once in their setup, so they can't do this. But there is no explanation in the blog about why they use so many platforms and so many routing tools. Sounds to me like they got themselves into a mess and decided to continue on that path.

link

jonathankoren 1029 days ago

Somewhere, an engineering “leader” is going to point to this blog post and then say, “Well, that’s how Slack did it!” and promptly copy this overwrought system

link

vinnymac 1029 days ago

I’m not sure if you’re being serious, but in any case; This will happen, as it always does, inevitably.

link

sitkack 1029 days ago

Warning statement becomes the howto guide.

link

esprehn 1029 days ago

Cells are not about guarding against AZ failure, but about partitioning the production infra to protect against bad deploys and configuration changes. Every AZ is split into many different cells.

link

hliyan 1029 days ago

So, guarding against human errors / process failures, and not hardware failures?

link

ec109685 1029 days ago

Isn’t that exactly what they are doing? Keeping requests within an AZ and instead of using DNS at the first hop into AZ, they use envoy to control traffic shaping and making that initial decision if traffic needs to be routed away.

link

Terretta 1029 days ago

You're doing it right.

link

ec109685 1029 days ago

Isn’t that exactly what they are doing? Keeping requests within an AZ and using global DNS at the first hop into AZ.

link