Hacker News new | ask | show | jobs
by threeseed 947 days ago
> Since they needed to function even when AWS is down

AWS as a whole has never been down.

It's Cloud 101 to architect your platform to operate across multiple availability zones (data centres). Not only to insulate against data centre specific issues e.g. fire, power. But also AWS backplane software update issues or cascading faults.

If you read what they did it's actually worse than AWS because their Kubernetes control plane isn't highly-available.

3 comments

People often learn the lessons in a hard way: they will keep saving 230k/yr until one day their non-HA bare-metal is down and major customers retreat.
> We have a ready to go backup cluster on AWS that can spin up in under 10 minutes if something were to happen to our co-location facility.

Sounds like they already have their bases covered.

Still need to synchronise data, update DNS records, wait for TTLs to expire.

HA architectures exist for a reason because that last step is a massive headache.

They need to do fire drills and practice this maybe daily or at least weekly? Failover being a normal case. Can’t you do failovers in DNS?
Yes, you can do it in DNS. Update the record with your new ingress, then wait for the timeout on the old record to assert itself and the new connections move over.

Not all DNS servers properly observe caching timeouts, so some customers may experience longer delays before they see it working again.

A significant percentage of users will still have their DNS resolver chain caching the old host.

Because TTLs are a guide not mandatory. And many companies/ISPs ignore it for cost reasons.

>It's Cloud 101 to architect your platform to operate across multiple availability zones (data centres)

A huge multi billion dollar company with "cloud" in its name recently had a big downtime because they did not follow "cloud 101".

Some AWS outages have affected all AZs in a given region, so they aren't always all that isolated. For this reason many orgs are investing in multi-cloud architectures (in addition to multi region)