Hacker News new | ask | show | jobs
by jsmthrowaway 3544 days ago
That's the sort of capacity planning and management an operations chief would do for you several quarters out, based on experience with shifting business needs that they have acquired over a career of dealing with highly specific use cases.

Think of it as putting an intelligent layer between your demand metrics and your server fleet. I live for utilization, just like you live for your product. Hire an operations nerd who does and your company will be much better off for it; based on description it sounds like you or the other engineers are already involved in operations anyway, so you probably won't need two. Hire one and let them tell you.

1 comments

So you're saying that we have to rely on a magic wizard to guess our future capacity?

Whatever. We'll just redo the collo vs GCE comparison accounting for the wizard fee and a margin of error on his future guess:

- Should be at least $200k/year (USA) or £100k/year (UK) for that kind of skill. How many of such wizard do we need? This price is only for one.

- Add 100% on the expected hardware costs. Because hardware is cheap, what is expensive is buying the wrong hardware and having to buy it again.

Would you denigrate a software engineer by saying you had to rely on them as a magic wizard for their knowledge the same way you're referring to someone with ops/capacity planning experience?

Experience and knowledge isn't wizardry or magic, but as a business owner you're free to burn up your cash on pride if that's what you'd like to do. Not every problem's solution is a google search or API call away.

I am a SRE, I've done software engineering for years, and I do ops and lots of capacity planning at the moment. I have no intention to denigrate anyone.

The word you should have focused on was 'magic'. In the same way that magic does not exist, it is not possible to plan future capacity with accuracy. It can even get worse if the system is of limited size because the time spent on analysing and planning can quickly cover the savings.

The only way to have good predictions is to already have the systems running in production [for months], with a [mostly] static user base, running applications that don't evolve, and never add any additional service. Given these circumstances, we could have good metrics on current usage and make good prediction on the future... except that to be at this point the hardware has had to be bought already.

> In the same way that magic does not exist, it is not possible to plan future capacity with accuracy.

I...what? Capacity Planning for Unpredictable Growth is literally SRE 103. It's hard and will have a margin of error, yes, but it's not bloody magic. The "magic" is in identifying and collecting the correct metrics that somewhat model the abstracted utilizations of the property, because almost everyone picks the wrong ones; this is situational so there is no blanket advice to offer except that load average is almost certainly wrong, as well as consulting only one metric. If you're working capacity from five or six key application metrics you're probably on the right path.

SRE is, quite specifically, application operations engineering. If you can't model your application's growth I'd be more inclined to call you an SA. (There's absolutely nothing wrong with that, to be clear, even speaking as an SRE. And I am aware several valley companies are diluting the term.)