|
|
|
|
|
by majormajor
1093 days ago
|
|
I've seen teams today have just as large an FTE team dedicated to infrastructure as my employee 10 years ago to manage a set of cloud services that cost wayyyy more than the colo hardware we had doing basically the same amount of traffic. So people cost hasn't always gone down as infra cost has gone up. "Shitty internal heroku" has some advantages once you have engineers that have been at the company more than 3 months: there's just not nearly as much surface area to understand + you have the source code right there if you really have to get int here. Let's leave judgements out of this - you can just as easily build a fragile tower of cloud services and abstractions to "feel smart" as you can reinventing serving an application (reinventing is a strong word anyway... it's just a different set of off the shelf tools usually). And in many cases "managed" services manage the easy part (standing up the hardware and installing the service) and don't manage the hard part (configuring the hundreds of knobs to optimize your particular use case) particularly well. I'm not close enough to the metal for those services/infra teams, though, to be able to completely tell if the FTE hours spent on all that cloud stuff are necessary. That is - can you throw stuff into the cloud and set-it-and-forget-it and not deal with ongoing cloud/infra/config maintenance? But one of the main things those teams seemed to often focus on was spend - if you leave it unattended is the cost just going to eat you up? But then there's a big irony. |
|
The main costs in terms of labor end up being making sure that backends services are updated. Same pain you’d feel in a colo, except you cloud provider may force an upgrade you’d otherwise wish to put off. Now I intentionally kept most things at the VM level of abstraction, because it was clear that the added complexity of something like Kuberetes wasn’t worth any savings you might get by needing one less server, and I had enough granularity of healthchecks to just automatically spin down any server that was causing problem and let autoscaling work it’s magic. This was also 10 years ago, some choices might make less sense today, YMMV.
Also, don’t think I’m saying all those people aren’t necessary based on the current needs of the business. I just knew I was in a very constrained environment and prioritized anything that allowed for the constant shrinking headcount my department was facing.