| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by maximusdrex 989 days ago
	It feels like every comment on this article didn’t read past the first paragraph. Every comment I see is talking about how they likely barely made any money on the transition once all costs are factored in, but they explicitly stated a critical business rationale behind the move that remains true regardless of how much money it cost them to transition. Since they needed to function even when AWS is down, it made sense for them to transition even if it cost them more. This may increase the cost of running their service (though probably not) but it could made it more reliable, and therefore a better solution, making them more down the line.

3 comments

threeseed 989 days ago

> Since they needed to function even when AWS is down

AWS as a whole has never been down.

It's Cloud 101 to architect your platform to operate across multiple availability zones (data centres). Not only to insulate against data centre specific issues e.g. fire, power. But also AWS backplane software update issues or cascading faults.

If you read what they did it's actually worse than AWS because their Kubernetes control plane isn't highly-available.

link

wbsun 989 days ago

People often learn the lessons in a hard way: they will keep saving 230k/yr until one day their non-HA bare-metal is down and major customers retreat.

link

christophilus 989 days ago

> We have a ready to go backup cluster on AWS that can spin up in under 10 minutes if something were to happen to our co-location facility.

Sounds like they already have their bases covered.

link

threeseed 989 days ago

Still need to synchronise data, update DNS records, wait for TTLs to expire.

HA architectures exist for a reason because that last step is a massive headache.

link

quickthrower2 989 days ago

They need to do fire drills and practice this maybe daily or at least weekly? Failover being a normal case. Can’t you do failovers in DNS?

link

BackBlast 989 days ago

Yes, you can do it in DNS. Update the record with your new ingress, then wait for the timeout on the old record to assert itself and the new connections move over.

Not all DNS servers properly observe caching timeouts, so some customers may experience longer delays before they see it working again.

link

threeseed 989 days ago

A significant percentage of users will still have their DNS resolver chain caching the old host.

Because TTLs are a guide not mandatory. And many companies/ISPs ignore it for cost reasons.

link

slig 989 days ago

>It's Cloud 101 to architect your platform to operate across multiple availability zones (data centres)

A huge multi billion dollar company with "cloud" in its name recently had a big downtime because they did not follow "cloud 101".

link

annexrichmond 989 days ago

Some AWS outages have affected all AZs in a given region, so they aren't always all that isolated. For this reason many orgs are investing in multi-cloud architectures (in addition to multi region)

link

icedchai 989 days ago

I'm not convinced of the critical business rationale. Your single data data center is much more likely to go down than a multi-AZ AWS deployment. The correct business rationale would be to go multi-cloud.

link

maximusdrex 989 days ago

For what it's worth I'm not either and absolutely agree with your point, it just felt like that was the more important argument but not the one people were engaging with.

link

oxfordmale 989 days ago

You can use multiple availability zones and if needed even multi cloud. If you own the hardware, you do regularly need to test the UPS power supply to ensure there is a graceful fail over in case of a power outage. Unless of course, you buy the hardware already hosted in a data centre.

link