Hacker News new | ask | show | jobs
by maximusdrex 941 days ago
It feels like every comment on this article didn’t read past the first paragraph. Every comment I see is talking about how they likely barely made any money on the transition once all costs are factored in, but they explicitly stated a critical business rationale behind the move that remains true regardless of how much money it cost them to transition. Since they needed to function even when AWS is down, it made sense for them to transition even if it cost them more. This may increase the cost of running their service (though probably not) but it could made it more reliable, and therefore a better solution, making them more down the line.
3 comments

> Since they needed to function even when AWS is down

AWS as a whole has never been down.

It's Cloud 101 to architect your platform to operate across multiple availability zones (data centres). Not only to insulate against data centre specific issues e.g. fire, power. But also AWS backplane software update issues or cascading faults.

If you read what they did it's actually worse than AWS because their Kubernetes control plane isn't highly-available.

People often learn the lessons in a hard way: they will keep saving 230k/yr until one day their non-HA bare-metal is down and major customers retreat.
> We have a ready to go backup cluster on AWS that can spin up in under 10 minutes if something were to happen to our co-location facility.

Sounds like they already have their bases covered.

Still need to synchronise data, update DNS records, wait for TTLs to expire.

HA architectures exist for a reason because that last step is a massive headache.

They need to do fire drills and practice this maybe daily or at least weekly? Failover being a normal case. Can’t you do failovers in DNS?
Yes, you can do it in DNS. Update the record with your new ingress, then wait for the timeout on the old record to assert itself and the new connections move over.

Not all DNS servers properly observe caching timeouts, so some customers may experience longer delays before they see it working again.

A significant percentage of users will still have their DNS resolver chain caching the old host.

Because TTLs are a guide not mandatory. And many companies/ISPs ignore it for cost reasons.

>It's Cloud 101 to architect your platform to operate across multiple availability zones (data centres)

A huge multi billion dollar company with "cloud" in its name recently had a big downtime because they did not follow "cloud 101".

Some AWS outages have affected all AZs in a given region, so they aren't always all that isolated. For this reason many orgs are investing in multi-cloud architectures (in addition to multi region)
I'm not convinced of the critical business rationale. Your single data data center is much more likely to go down than a multi-AZ AWS deployment. The correct business rationale would be to go multi-cloud.
For what it's worth I'm not either and absolutely agree with your point, it just felt like that was the more important argument but not the one people were engaging with.
You can use multiple availability zones and if needed even multi cloud. If you own the hardware, you do regularly need to test the UPS power supply to ensure there is a graceful fail over in case of a power outage. Unless of course, you buy the hardware already hosted in a data centre.