Hacker News new | ask | show | jobs
by trabant00 1639 days ago
I have a rule that is simple, effective but also quite rude: if you can't deliver and maintain a 500 instances large infrastructure, same uptime and all, at half the cost of AWS by yourself (1 person) in 3 months using only open source solutions basically you should not have an opinion about this. You are just rationalizing your incompetence on this particular subject. Sorry to be this blunt but I am simply tired of listenting to people who can't do it explain that what I do every day can't or should not be done.
6 comments

I don't consider this rude or blunt, but rather incomplete as I really am not sure what points are frustrating for you or what you would hope someone takes away from it -- I'm an outside observer on the cloud subject as I have seen huge debates over use of the public cloud internally at my company and also with client companies.

I've seen the billing costs of the public cloud absolutely demolish an IT org's yearly budget in a month because of unexpected cost upticks, and I've seen a reduction in total cost of ownership by reducing needed licensing/staff/building costs. I get both sides on what the public cloud can do.

I've also seen what you can do on-premises; I've worked with clients who manage 7000+ machines (mostly virtual + some physical) with a team of 4 using pretty reasonably priced on-site hardware. (pro-tip, I guess Hitachi boxes are absurdly great servers with fantastic uptime, pockmarked only by an absolutely horrendous UI to manage)

My experience from the many clients I work with is that it is less about the specific stack you settle on and more your comfort level in getting the most efficiency out of it. The deeper and more intimate you are with all levels of your infrastructure, the better you know how to eke out the most from every single $.01 you spend on it.

When I'm frustrated I'm not exactly clear in my writing.

You need to be able to do both options before having an opinion on which is appropriate in which case. I am suprised to have to state this. But in my experience people argue one option a lot without being to deliver the other.

People who know bare metal are rare these days from the total of available infrastructure engineers (call them sysadmins, devops, etc). I guess this justifies companies looking at cloud a little bit. But if you really search you can find engineers sub 100k per year being able to deliver 100k per month savings compared to AWS.

There are also engineers who stayed away from cloud and can't deliver that option. A lot more rare though. The same level of wrong if they argue against cloud from ignorance.

The right choice for serious infrastructures is always both these days. Have the bulk on premise for steady loads and 95% of features, expand to public clouds for dynamic scaling and features you don't want do do yourself, at least yet. This combination offers good costs, flexibility, covers possible future needs, etc

> People who know bare metal are rare these days from the total of available infrastructure engineers

Sysadmins are not rare they're just not the people you hear about in Silicon Valley bubble anymore. 90+% of businesses haven't moved to the "cloud" (i.e. whoever the fuck's computer you can't get your hands on in case of problems) and even if they wanted to it would make no sense: most businesses just need a basic website and an email/accounting service. Cloud abstractions provide much complexity and zero benefits for such usecases.

> But in my experience people argue one option a lot without being to deliver the other.

I'm in this box. I can't deliver "cloud" computing and from a political perspective i refuse to "learn". Also, it makes no sense for the non-profit projects i work with: the biggest ones need at most a few servers which is still manageable by hand and certainly easier to deal with via Ansible/Chef than via new layers of abstractions and all their new failure modes (eg k8s/AWS).

> most businesses just need a basic website and an email/accounting

I think those businesses should definitely go to the Cloud - but not IaaS. Use Microsoft 365 or Google Workspace for the email needs and a Website-as-a-service vendor, whether that’s wordpress.com or Webflow.

I'm not saying you're wrong, but why do you call that the cloud again? Mutualized hosting is what we've been doing since "forever".

The part where i disagree: don't go with Microsoft or Google, they're the worst. They've got less-than-stellar service, abysmal support, and they're capitalist assholes. Go with a local tech coop or non-profit (or even just a local tech artisan for-profit company) with friendly support.

I think it's been said in many other threads, but it's always worth repeating: by using Microsoft/Google email services, you make it impossible for others to use a solution of their choice because they will be blocked despite having perfect server configuration.

Thank you for sharing this, it helps clear up the concern you had a lot.

I deal with some big German clients fairly frequently and one of the requirements is "[they] own the entire stack top to bottom, back to front." A lot of dark site operations I work on also share a similar requirement, so really it's why I'm far more open and comfortable with an all on premises situation since I see the scaling done without any public cloud.

From what I do work with on public cloud sure, I absolutely get why it is so easy to scale if you don't already have a good team to build and orchestrate a local set up. I also see some big name companies I contract with just throw money onto a fire fueled by Azure, and while the expenditure hurts sure, it's still considered acceptable.

I guess I probed because I see a lot of different sides of modern architecture and aside from a well documented and disciplined one, I'm not sure there's a right thing with modern architecture, just different comfort zones with different efficiencies.

There are few things we want to do on-premise any more. The main problem of on-premise, and benefit of cloud, is that we can add new capacity at a moments notice. You never have to wonder if you’ll need to add more capacity (with two month lead times) to provision a database.

Now you could say that infra teams that do not anticipate such a need are less than ideal, and I’d agree with you, but I haven’t been part of them and I imagine they have their own issues to deal with.

Cloud (as a dev) makes me not worry about infra teams, since they’re not our problem (beyond the ones managing the cloud environment).

There are certainly companies like yours that are perhaps Web product driven and need flexible scalability, but there are many out there that have little requirements for such scaling, at least unexpectedly, which will run perfectly fine with on-premises virtualisation.

The poster above is right, both have their purpose, but those sold on cloud as the complete solution are kidding themselves in most cases, Happy to accept crazy cloud cost blow-outs above over-provisioning tin or thinking properly about the use-cases.

It honestly sounds like you don't care about efficiency because of either good inflows or a need to move extremely fast. Such is the appeal of cloud...

You are actually right. Thinking back to other companies I’ve worked for, only the last two had any need for cloud, the others had a more or less stable workload that was ideal for on-premise. Also all between 5-200 employees, I wonder if that matters.
Sub 100k engineers are a fiction. Sure, you could get somebody on staff for under 100k salary, but it's not necessarily going to be someone competent.

But even aside from that: OK, you found somebody who agrees to work for 90k. What about social security tax? Group health coverage? Workman's comp insurance? HR support? Payroll? Risk of lawsuits if someone hurts them/their feelings?

Thing is: many AWS customers have a sub-100k/mo bill. Savings from this sub-100k person will be relatively lower.

On top of that, for small/mid-sized companies, it's difficult to avoid "employee-lockin". It's perceived as a minor risk to have a vendor lock-in. Unfortunately, often they turn out to be right.

The cost of three person months for a reasonably competent devops person is probably close to 50000$. Maybe a bit less for a cocky junior one that will make a mess and a bit more if you pay premium freelance rates for somebody less likely to botch the job. That pays for a lot of infrastructure. Not counting your own cost is a rookie mistake. And not realizing you really need 4-6 of these people to be able to get to your five nines is the second mistake (you need people on call 24x7 and when they are sick, over Christmas, etc). So, the real cost would be closer to 1M/year. Just staffing to babysit stuff you build manually ... or you pay Amazon, Google, etc. and you just worry about your own application not crashing. That's why this is so popular.

Few companies actually need that many instances. The math for the less than 10-20 instances the vast majority of companies actually need is quite brutal. A day of your time basically pays for months/years of hosting. The thing to optimize is devops time. Not hosting cost. It's by far the most expensive thing and also the most likely thing to fail on you (by leaving, by being incompetent, negligent, lazy, sick, etc.) and also the hardest thing to source when you need more of it. Good devops people are scarce.

I've dealt with plenty of companies that had no more than two or three idling t2 instances paying for multiple devops people to babysit that "infrastructure". It's stupid and wasteful. A decent devops person costs about 0.5-1 instance year (i.e. a full year of hosting 24x7) per hour for such small instances. And scaling an instance group from 2 to 500 instances is a 1 minute job if you ever need to. Unless the savings are enormous, the time they spend on minimizing the number of instances or automating their deployment will never be worth the money. It's money down the drain. You need to think in terms of a few hours for getting stuff done to make it worth the cost. Anything more is probably too expensive.

> And not realizing you really need 4-6 of these people to be able to get to your five nines is the second mistake (you need people on call 24x7 and when they are sick, over Christmas, etc).

If you need that kind of availability, you need to have people on call anyway to babysit your app. A good infrastructure (unless built to the minimal price point) will handle nearly all cases of hardware failure automatically, without someone having to wake up, so it's not likely to put additional load on those people.

I'm not necessarily disagreeing with your overall point, but if you need five nines, you're talking about an entirely different league of infrastructure compared to people who need two or three VMs that could also be handled by a NUC somewhere in the office (which will amortize itself against AWS in a few months).

I maintained a datacenter with approx 1000 hypervisors with a very small team and took few weeks to start having production workloads. The effort to maintain hardware was quite little and it was hugely cheaper than any cloud service.

Having said that, your requirement is pretty absurd. Billions of people choose to own and maintain houses and cars and cook their own food because it's cheaper than the alternatives. Nobody expects them to be professional mechanics or cooks.

> Nobody expects them to be professional mechanics or cooks.

To be fair, there’s also nobody that would listen to them over a professional mechanic or cook.

And they wouldn't open a commercial restaurant on top of their DIY "home food infrastructure".
Airbnb? :)

You’ve got a very wide variety of home professionals, especially looking at YouTubers, some being fantastic, while others …not so much.

Which 500 instances? 500 ec2 “large” is like half of a rack… or is the point to engage opponent in unwinnable argument?
I think i can beat that argument pretty easy, 5 freebsd hosts with each 100 running jails...it's cheaper, uptime is something to discuss, but the data is not on someone elses computer.

The real cost of cloud is that nearly anyone think it's impossible to setup a infra for yourself....loosing systemadmin as a role in companys probably the biggest loss.

Not "anyone", but probably most newcomers to the industry. Since they are simply not exposed to the non-cloud ways of setting up infra.

On ever AWS/cloud post here the first comment is usually rent dedicated boxes from Hetzner (or whomever) and you can cut your costs. (And especially now with k8s it is really really easy to have something sane on bare-ish metal.)

But at the same time what "cloud" gives to people is 20+ PoPs around the world. Basically giant hosting companies, with an endless list of bells and whistles.

What’s interesting is in the past 5-10y the technology to run your own smaller dc (on the order of few dozens of racks) had become extremely commoditized. You can buy 100G switches for pennies now and every piece of software has a high quality open source version down to bmc level. I believe we’ll see a reverse trend for established SV companies in the next 10 years
But at the same time "established SV companies" are already paying so much for labor costs that probably they don't want to hire anyone to run a DC for them.

(Maybe they'll really start hiring remotely. Maybe not.)

Just task their existing staff to “run the dc”. Way easier to figure this out than grinding leetcodes if you ask me. Also you can rent pretty much everything in that chain these days so your existing “infra engs” can manage soft layer while you outsource all the management of hardware and below.
Assuming the 500 instances are something like m5.4xlarge, the on-demand cost would be $3M. You save something on reserved instances, but have other excessive costs like egress, so that should be the right order of magnitude.

So why do you limit a project that's supposed to save $1M+ to a cost of $50k or so?

I've learned a lot from managing school networks for public schools. Thousands of users runnung on either re-purposed or gifted hardware from 8+ years ago.

In 2013 I was running a school with 500 students and 70 teachers on a 8GB RAM HP Server that was built in 2005 and had no problems other than disk speeds for network transfers.

The same setup in the cloud would have been much more expensive but then again I had/have access to unlimited Microsoft product licenses because of the MS-ACH agreement so take that with a grain of salt. They even give every public school in the country their own unlimited KMS host key.