Hacker News new | ask | show | jobs
by doh 3546 days ago
Depends where are you based. OPS in the Bay Area is above 100k/year which need to be added to the price of servers. Also one person has a huge bus factor so you essentially need two people (if one is sick or on vacation). Now you're at 200k+ annually without having any server installed.

200 servers (at least the one we were interested in) would cost us around $22k on OVH each month. That means if I remove the personal that I would have to hire, the cost of servers is now down to $6k/month (200/12). For that money you can't really find a better option.

In our case, where we are running thousands of severs on any given time, the flexibility is much more important than price. So we built our service around pre-emptible instances on GCE (the same as spot on AWS). You can't beat dedicated server in performance, but it's close enough and they're making for it by having a great infrastructure.

1 comments

As an operations employee, I'm shocked you've run thousands of servers in some kind of service while talking yourself out of any operations employees. An operations hire is not a prerequisite to moving past dedicated; even with nothing but 200 dedicated servers you are way past the point of needing at least minimal operations. Contract this out if you have to.

We are not a direct cost center that can be discussed in those terms. Our insight will reduce capital and operational expenditure beyond our salary, because that operations hire would have told you how insane of an idea paying $22,000/mo for four cabinets of gear is and why a capital tradeoff with depreciation is a fiduciary responsibility to your investors and shareholders. You can buy at least a dozen U for that each month and then pay for nothing but where it lives with a dash of break-fix to taste.

I can put four cabinets of gear in a colocation for a quarter or less than that if you'd swing a little capital. You are wasting money on poor operations architecture and design and you don't have anybody to really tell you.

Even beyond that operations is a skill, much like marketing. I know a lot of people think they can fake it for a while (and they usually can), but after a point it's time to act like a grown up company and bring someone who does nothing but think about this shit on board. Security, performance, remediation, all the system level grunt work you shouldn't be concerning yourself with as engineers. Or you can keep throwing multiple operations salaries at your four cabinet OVH deal and keep getting ripped off.

We have no dedicated servers as we're running everything on GCE. Our use case is highly specific as we've a huge ingress network requirements, which we're getting free of charge.

> "You can buy at least a dozen U for that each month and then pay for nothing but where it lives." We don't need dozen a month. We need hundreds now. We may not need them in 6 months though and then what? Will I rent them out?

That's the sort of capacity planning and management an operations chief would do for you several quarters out, based on experience with shifting business needs that they have acquired over a career of dealing with highly specific use cases.

Think of it as putting an intelligent layer between your demand metrics and your server fleet. I live for utilization, just like you live for your product. Hire an operations nerd who does and your company will be much better off for it; based on description it sounds like you or the other engineers are already involved in operations anyway, so you probably won't need two. Hire one and let them tell you.

So you're saying that we have to rely on a magic wizard to guess our future capacity?

Whatever. We'll just redo the collo vs GCE comparison accounting for the wizard fee and a margin of error on his future guess:

- Should be at least $200k/year (USA) or £100k/year (UK) for that kind of skill. How many of such wizard do we need? This price is only for one.

- Add 100% on the expected hardware costs. Because hardware is cheap, what is expensive is buying the wrong hardware and having to buy it again.

Would you denigrate a software engineer by saying you had to rely on them as a magic wizard for their knowledge the same way you're referring to someone with ops/capacity planning experience?

Experience and knowledge isn't wizardry or magic, but as a business owner you're free to burn up your cash on pride if that's what you'd like to do. Not every problem's solution is a google search or API call away.

I am a SRE, I've done software engineering for years, and I do ops and lots of capacity planning at the moment. I have no intention to denigrate anyone.

The word you should have focused on was 'magic'. In the same way that magic does not exist, it is not possible to plan future capacity with accuracy. It can even get worse if the system is of limited size because the time spent on analysing and planning can quickly cover the savings.

The only way to have good predictions is to already have the systems running in production [for months], with a [mostly] static user base, running applications that don't evolve, and never add any additional service. Given these circumstances, we could have good metrics on current usage and make good prediction on the future... except that to be at this point the hardware has had to be bought already.