Hacker News new | ask | show | jobs
by calvinmorrison 968 days ago
I was an ops engineer at Fastmail. We ran our own hardware. A mix of Debian on one stack and SmartOS (Illumos from Joyent) and there were plenty of physical problems and costs. Now putting petabytes of data with replication and syncing and all that would have been a cluster F on the cloud, we missed out on a lot of the awesome newer deployment other tooling because we had written our own. Before I left we swapped out a good chunk of these snowflake software but it was impressive actually how well it worked. Good multi data center multi master mysql support in 2008? Killer feature. Maintaining that in 2020? Horrible.

Also there were plenty of upstream routing issues where solving that became a headache. The #1 thing we wanted was uptime and the #1 outage was our upstream providers having trouble routing to other upstream providers.

The number one reason and tradeoff for cloud is uptime and availability and the cost of not having it

2 comments

> The number one reason and tradeoff for cloud is uptime and availability and the cost of not having it

100%. I ran API networking teams at a Big Tech and I know the difference. My workload here is at a hobbyist+ grade, no controls, no compliance, best effort SLA. I don't want to ignore the reality that to get enterprise grade uptime and availability, it's really hard to do this on prem.

> The number one reason and tradeoff for cloud is uptime and availability and the cost of not having it

Uptime? There have been quite a few catastrophic cloud failures. And some lasted hours.

Five nine is something like 5 minutes of downtime a year. The clouds aren't anywhere near that.

If you use it correctly, with multiple availability zones and even multi regions you can reach very high reliability. They surely don't offer five 9s for a single zone. I am not aware of many multi regions outages. And they are also getting better over time, spending a lot more engineering hours on reliability that most companies. And if they fail to deliver on SLA they might give you money (depending on your contract I guess).
Do you really think on-prem has more uptime than commercial cloud? For non-tech companies, no even close. Commercial cloud is adding nines to uptime and saving money by removing in-house IT admin staff.
the biggest miscalculation is really on expected uptime. People say they need 5 nines, yet take their car in for service twice a year for a tire rotation and oil change. How much uptime is _really_ required?