This is why any criticism of AWS reliability is meaningless to me. All the cloud providers go down - all of them. Either you are multi-cloud, or you run your own hardware, but these events are inevitable.
The amount of time you are down vs. up dictates your SLOs and SLAs. Criticism of how reliable one vs. another is is not only valid, it's backed by hundreds of millions of contractual dollars and credits every year. We spend tens of millions on AWS per year. We have several SLAs with them. Our Elasticache SLA was breached once (localized to us - not whole customer base) and we got credits which were commensurate with the amount of business we lost during that downtime period.
If one provider is down more than the others, the criticism is not only valid, it results in real business loss for the provider and its customers.
On multi-cloud: it's one way to reduce the amount of downtime you have, but it comes with a significant operational cost depending on how your application is architected and how your teams internal to your company are formed. It is totally practical for someone to bank on AWS' reliability until they're at a significant amount of traction or revenue where the added uptime of going multicloud is worth the investment. I know you're not saying this isn't the case (I think you're saying "do that if you're going to complain about 1 providers' uptime"), but thought it was worth putting the context into the HN ether.
You definitely need to look at your SLA with your customers, but in my experience, multi-cloud isn't worth it. It's easier to be slightly less reliable, and throw your top-three cloud provider under the bus in the public post mortem. You'll probably cause bigger outages on your own in between provider outages, and multi-cloud adds another layer of complexity for things to go wrong.
Multi-cloud is saying you think you can manage Kafka across two or three clouds better than GCP can manage Pub/Sub.
One aspect of that is the box in the closet is (in my experience anyway) either up or down. It fails more often, but it fails simpler.
In the cloud, even very small scale apps can run into weird situations like the app server is up, the database is down, and the cache is responding about 50% of the time.
If you don't account for that from the beginning, it can lead to your app displaying some bizarre stuff to users.
I haven't run a server locally in 13 years but I can see why some people would miss it.
I've worked in companies that had everything on prem and cloud companies. There are many nice things about cloud, but reliability is not one of them. Everything is a lot simpler on prem and fails a lot less in my experience. The downside being that scaling is harder. And it can be more expensive, depending on your size.
Right? I can pay extra to have two ISPs for upstream connection, but I have no idea how I'd get a second, totally redundant power connection to the closet in my basement. A UPS with a battery's only going to last so long, so is generator fuel.
> This is why any criticism of AWS reliability is meaningless to me.
Is anyone tracking reliability for these public providers? Would be curious how AWS compares to Azure and GCP. My experience is it's better, but we may have avoided Kinesis or whatever that keeps going down.
If one provider is down more than the others, the criticism is not only valid, it results in real business loss for the provider and its customers.
On multi-cloud: it's one way to reduce the amount of downtime you have, but it comes with a significant operational cost depending on how your application is architected and how your teams internal to your company are formed. It is totally practical for someone to bank on AWS' reliability until they're at a significant amount of traction or revenue where the added uptime of going multicloud is worth the investment. I know you're not saying this isn't the case (I think you're saying "do that if you're going to complain about 1 providers' uptime"), but thought it was worth putting the context into the HN ether.