| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by JMTQp8lwXL 1964 days ago
	You have to know how to write code that fits into the cloud. You can't arbitrarily read/write to the file system, acting as if there's only one instance of the server running (if you plan to run hundreds or thousands). So even by waving the cloud 'magic wand', you still need to understand writing code in a cloud-friendly way. So in some sense, it's a shared responsibility between the vendor and engineering. You need to understand how to apply the tools being given to you.

1 comments

tw04 1964 days ago

Per the article, literally nothing in their code would have solved the issue. AWS was supposed to auto-scale TGWs and didn't.

>Our own serving systems scale quickly to meet these kinds of peaks in demand (and have always done so successfully after the holidays in previous years). However, our TGWs did not scale fast enough. During the incident, AWS engineers were alerted to our packet drops by their own internal monitoring, and increased our TGW capacity manually. By 10:40am PST that change had rolled out across all Availability Zones and our network returned to normal, as did our error rates and latency.

JMTQp8lwXL 1964 days ago

Correct, I was disputing the point that you can freely code without being mindful of the architecture even though the selling point of cloud providers is "focus on code, leave architecture to us". I'm not disputing in this case AWS was at fault: as the customer, Slack did everything right.