| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by cheeze 1117 days ago

The idea that "we'll just seamlessly failover to another provider" is a bit otpimistic IMO. With that comes additional complexity. Some applications need this, but it's a huge cost and complexity tradeoff for almost all businesses.

I'm a fan of sticking with one provider, but going with something bigger that has a good track record. AWS, GCP, Azure aren't prone to 0 outages, but I think for almost all companies, having redundant stacks in separate regions is enough to maintain high availability.

I don't know enough about Oracle Cloud to comment on them, but my general take is these companies all inevitably hit a "showstopper" global outage, realize they aren't investing enough in separation of regional stacks enough, and put a ton of energy into making their platforms more fault tolerant.

Thinking that Johnny dev shop is going to be able to do better than a major player is, IMO, wishful thinking.

I know that at GCP at least, they actually have monitoring setup for things like tweets, downdetector, etc. Ideally they catch every issue with their own monitoring, but they do their best to know if anyone is having an issue, whether they can detect it or not..