Hacker News new | ask | show | jobs
by diggan 448 days ago
> A lot of people have convinced themselves that cloud is cheap

I've noticed this too, freelancing/consulting around in companies. I'm not sure where this idea even comes from, because when cloud first started making the news, the reasoning went something like "We're OK paying more since it's flexible, so we can scale up/down quickly", and that made sense. But somehow today a bunch of people (even engineers) are under the belief that cloud somehow is cheaper than the alternatives. That never made sense to me, even when you take into account hiring people specifically for running the infrastructure, unless you're a one-person team or have to aggressively scale up/down during a normal day.

4 comments

I can provide an example where cloud, despite its vastly higher unit costs, makes sense. Analytics in high finance (note: not HFT). Disclosure: my employer provides systems for that.

A fair number of our clients routinely spin up workloads that are CPU bound on hundreds-to-thousands of nodes. These workloads can be EXTREMELY spiky, with a baseload for routine background jobs needing maybe 3-4 worker nodes, but with peak uses generating demand for something like 2k nodes, saturating all cores.

These peak uses also tend to be relatively time sensitive, to the point where having to wait two extra minutes for a result has real business impact. So our systems spin up capacity as needed, and once the load subsides, terminates unused nodes. After all, new ones can be brought up at will. When the peak loads are high (& short) enough, and the baseload low enough, the elastic nature of cloud systems has merit.

I would note that these are the types of clients who will happily absorb the cross-zone networking costs to ensure they have highly available, cross-zone failover scenarios covered. (Eg. have you ever done the math on just how much a busy cross-zone Kafka cluster generates in zonal egress costs?) They will still crunch the numbers to ensure that their transient workload pools have sufficient minimum capacity to service small calculations without pre-warm delay, while only running at high(er) capacity when actually needed.

Optimising for availability of live CPU seconds can be a ... fascinating problem space.

There are absolutely plenty of spaces where this is true and cloud makes sense either because it's actually cost effective, or because the cost doesn't matter.

Most people aren't in those situations, though, but I think a lot of them think they're much closer to your scenario than the much more boring situation they're actually in.

> paying more since it's flexible, so we can scale up/down quickly

I’ve heard this argument too and I think I’ve seen exactly one workload where it actually made sense and was tuned properly and worked reliably.

I've noticed this too, freelancing/consulting around in companies. I'm not sure where this idea even comes from

Internal company accounting can be weird and lead to unintuitive local optima. At companies I've worked at, what was objectively true was that cloud was often much cheaper than what the IT department would internally bill our department/project for the equivalent service.

I think it's because people think their workloads are extremely spiky, and so assume they will spin up/down loads enough to save money, and that has translated into cloud being perceived as cheap.

But devs rarely pay attention to metrics. I've had clients with expensive Datadog setups where it was blatantly obvious that nobody had ever dug into the performance data, because if they did they'd have noticed that key metrics were simply not fed to it.

If they did pay attention, most of them would realise that their autoscaling rarely kicks in all that much, if at all. Often because it's poorly tuned, but also because most businesses see small enough daily cycles.

Factor in that the cost difference between instances vs. managed servers is quite significant, and you need to have significant spikes much shorter in duration than most businesses day/night variation to save money.

It can make sense to be able to spin up more capacity quickly, but then people need to consider that 1) a lot of managed hosting providers has hardware standing by and can automatically provision it for you rapidly too - unless you insist on only using your own purchased servers in a colo, you can get additional capacity quickly, 2) a lot of managed hosting providers also have cloud instances so you can mix and match, 3) worst case you can spin up cloud instances elsewhere and tie it into your network via a VPN.

Some offer the full range from colo via managed servers to cloud instances in the same datacentres.

Once you prep for a hybrid setup, incidentally, cloud becomes even less competitive, because suddenly you can risk pushing the load factor on your own/managed servers much closer to the wire, knowing you can spin up cloud instances as a fallback. As a result, the cost per request for managed servers drops significantly.

I also blame a lot of this on business often shielding engineering from seeing budgets and costs. I've been in quite senior positions in a number of companies where the CEO or CFO were flabbergasted when I asked for basics costing of staff and infra, because I saw it as essential in planning out architecture. Engineers who aren't used to seeing cost as part of their domain will never have a good picture of costs.