Hacker News new | ask | show | jobs
by jsmthrowaway 3546 days ago
As an operations employee, I'm shocked you've run thousands of servers in some kind of service while talking yourself out of any operations employees. An operations hire is not a prerequisite to moving past dedicated; even with nothing but 200 dedicated servers you are way past the point of needing at least minimal operations. Contract this out if you have to.

We are not a direct cost center that can be discussed in those terms. Our insight will reduce capital and operational expenditure beyond our salary, because that operations hire would have told you how insane of an idea paying $22,000/mo for four cabinets of gear is and why a capital tradeoff with depreciation is a fiduciary responsibility to your investors and shareholders. You can buy at least a dozen U for that each month and then pay for nothing but where it lives with a dash of break-fix to taste.

I can put four cabinets of gear in a colocation for a quarter or less than that if you'd swing a little capital. You are wasting money on poor operations architecture and design and you don't have anybody to really tell you.

Even beyond that operations is a skill, much like marketing. I know a lot of people think they can fake it for a while (and they usually can), but after a point it's time to act like a grown up company and bring someone who does nothing but think about this shit on board. Security, performance, remediation, all the system level grunt work you shouldn't be concerning yourself with as engineers. Or you can keep throwing multiple operations salaries at your four cabinet OVH deal and keep getting ripped off.

1 comments

We have no dedicated servers as we're running everything on GCE. Our use case is highly specific as we've a huge ingress network requirements, which we're getting free of charge.

> "You can buy at least a dozen U for that each month and then pay for nothing but where it lives." We don't need dozen a month. We need hundreds now. We may not need them in 6 months though and then what? Will I rent them out?

That's the sort of capacity planning and management an operations chief would do for you several quarters out, based on experience with shifting business needs that they have acquired over a career of dealing with highly specific use cases.

Think of it as putting an intelligent layer between your demand metrics and your server fleet. I live for utilization, just like you live for your product. Hire an operations nerd who does and your company will be much better off for it; based on description it sounds like you or the other engineers are already involved in operations anyway, so you probably won't need two. Hire one and let them tell you.

So you're saying that we have to rely on a magic wizard to guess our future capacity?

Whatever. We'll just redo the collo vs GCE comparison accounting for the wizard fee and a margin of error on his future guess:

- Should be at least $200k/year (USA) or £100k/year (UK) for that kind of skill. How many of such wizard do we need? This price is only for one.

- Add 100% on the expected hardware costs. Because hardware is cheap, what is expensive is buying the wrong hardware and having to buy it again.

Would you denigrate a software engineer by saying you had to rely on them as a magic wizard for their knowledge the same way you're referring to someone with ops/capacity planning experience?

Experience and knowledge isn't wizardry or magic, but as a business owner you're free to burn up your cash on pride if that's what you'd like to do. Not every problem's solution is a google search or API call away.

I am a SRE, I've done software engineering for years, and I do ops and lots of capacity planning at the moment. I have no intention to denigrate anyone.

The word you should have focused on was 'magic'. In the same way that magic does not exist, it is not possible to plan future capacity with accuracy. It can even get worse if the system is of limited size because the time spent on analysing and planning can quickly cover the savings.

The only way to have good predictions is to already have the systems running in production [for months], with a [mostly] static user base, running applications that don't evolve, and never add any additional service. Given these circumstances, we could have good metrics on current usage and make good prediction on the future... except that to be at this point the hardware has had to be bought already.

> In the same way that magic does not exist, it is not possible to plan future capacity with accuracy.

I...what? Capacity Planning for Unpredictable Growth is literally SRE 103. It's hard and will have a margin of error, yes, but it's not bloody magic. The "magic" is in identifying and collecting the correct metrics that somewhat model the abstracted utilizations of the property, because almost everyone picks the wrong ones; this is situational so there is no blanket advice to offer except that load average is almost certainly wrong, as well as consulting only one metric. If you're working capacity from five or six key application metrics you're probably on the right path.

SRE is, quite specifically, application operations engineering. If you can't model your application's growth I'd be more inclined to call you an SA. (There's absolutely nothing wrong with that, to be clear, even speaking as an SRE. And I am aware several valley companies are diluting the term.)