Hacker News new | ask | show | jobs
by majormajor 1093 days ago
I've seen teams today have just as large an FTE team dedicated to infrastructure as my employee 10 years ago to manage a set of cloud services that cost wayyyy more than the colo hardware we had doing basically the same amount of traffic. So people cost hasn't always gone down as infra cost has gone up. "Shitty internal heroku" has some advantages once you have engineers that have been at the company more than 3 months: there's just not nearly as much surface area to understand + you have the source code right there if you really have to get int here. Let's leave judgements out of this - you can just as easily build a fragile tower of cloud services and abstractions to "feel smart" as you can reinventing serving an application (reinventing is a strong word anyway... it's just a different set of off the shelf tools usually). And in many cases "managed" services manage the easy part (standing up the hardware and installing the service) and don't manage the hard part (configuring the hundreds of knobs to optimize your particular use case) particularly well.

I'm not close enough to the metal for those services/infra teams, though, to be able to completely tell if the FTE hours spent on all that cloud stuff are necessary. That is - can you throw stuff into the cloud and set-it-and-forget-it and not deal with ongoing cloud/infra/config maintenance? But one of the main things those teams seemed to often focus on was spend - if you leave it unattended is the cost just going to eat you up? But then there's a big irony.

2 comments

My experience with cloud migrations was that you could migrate and forget about it. This was for a high traffic metro newspaper, that was easy to cache for external users but needed to handle a large newsroom using wordpress to manage the content (wordpress can be a huge resource hog).

The main costs in terms of labor end up being making sure that backends services are updated. Same pain you’d feel in a colo, except you cloud provider may force an upgrade you’d otherwise wish to put off. Now I intentionally kept most things at the VM level of abstraction, because it was clear that the added complexity of something like Kuberetes wasn’t worth any savings you might get by needing one less server, and I had enough granularity of healthchecks to just automatically spin down any server that was causing problem and let autoscaling work it’s magic. This was also 10 years ago, some choices might make less sense today, YMMV.

Also, don’t think I’m saying all those people aren’t necessary based on the current needs of the business. I just knew I was in a very constrained environment and prioritized anything that allowed for the constant shrinking headcount my department was facing.

Just curious, what is the ratio of employee cost vs server cost for the team under you?
Wasn't running them all, but a couple recent companies had ratios of about 1:4 and 1:2 (employee cost : cloud cost) with cloud spend in the low-to-high hundred Ks annually.

Curiously, the company with the higher cloud bill also had the lower ratio (spending more on both, but proportionally more on engineering); my diagnosis after the fact is that they were building for a "do 10x the scale" future that was never realized but which would've pushed the cloud spend higher without needing more dev spend. In the future, I wouldn't spend so much that far in advance of actually needing to scale.