Hacker News new | ask | show | jobs
by dijit 1340 days ago
I really agree with you, what's weird though is how many mega-corps are going away from Custom Hardware in Custom Built DC towards Cloud.

There's also something to be said for buying a VPS or a Colo machine, making sure it's backed up and dealing with the 9's that you get from that machine on it's own. I am routinely surprised by how far a single node machine will get you.

8 comments

> what's weird though is how many mega-corps are going away from Custom Hardware in Custom Built DC towards Cloud

It costs a lot of money to run your own datacenters, and very very few companies are capable of doing it as good as AWS or even Scaleway/OVH can. By that I mean, waiting weeks/months to get through tickets, approvals, multiple different teams just to get a server deployed. Then waiting a few more weeks for monitoring/backups.

Allowing developers and related to have hardware/software at a whim is a massive advantage.

If you open a ticket with a real remote hands you should get a response back in minutes; typically someone will be on-site in your cage in under an hour. You also don’t “deploy a server” to reduce load you plan ahead every few months and deploy thousands at a time. Even then- if you have a good relationship with a really good systems integrator, they can ship and rack machines in a matter of days, not weeks.

I’m increasingly convinced that the large scale companies that don’t do bare-metal because they never learned how, and all the people advising them have never done bare metal or have done it poorly, so it’s like the blind leading the blind.. But they are leaving a 50-90% cost savings, better control over reliability, latency, data residency, etc on the table by doing so.

> If you open a ticket with a real remote hands you should get a response back in minutes; typically someone will be on-site in your cage in under an hour

Remote hands won't order your servers, configure your networking, install OSes/configure your PXE, and all the other tedious things running your own DC entails.

Yes, most DIY DCs are done terribly, that's to whole point - if so many people struggle with that, doesn't it make sense to just outsource it?

They will do whatever is in the contract. Yes, they will hook up a crash cart and PXE the server (If you need that), done it hundreds of times in the old days. However in modern datacenters no one "installs OSes and configures the network". You plug it in, turn it on, everything self-provisions and starts serving traffic.
> However in modern datacenters no one "installs OSes and configures the network". You plug it in, turn it on, everything self-provisions and starts serving traffic.

Absolutely agreed. That's what i used to in part do, and it's a massive effort to do everything automatically and efficiently, and it needs multiple people's time to create and maintain all the infrastructure, glue between different systems, scripts, tools. Even components as basic as DHCP suck absolutely (your options are either something from the 1990s, isc-dhcp-server, which lacks an API in any real sense, or Kea, made from the same people, which really shows), and before Tinkerbell there was literally nothing that could be used to automate such a thing at scale.

And more to my point, how many datacenters do you think are "modern"? I've only encountered one that was starting to get there (where i used to work before an acquisition by a company with arcane practices in their DCs), and having worked with hundreds of customers with "on prem" stuff, for the vast majority it's a legacy horror show.

The price for colocating in a Tier 2 datacenter close enough for me to walk to was $75/amp. Which is about the same cost as AWS.

You're really paying somewhere between the savings of putting a datacenter in Nowhere, Oregon and the cost to convince someone to live there.

How can you directly compare $75 / amp to AWS? This very much depends on the kind if load you have.

Also, AWS's egress likely costs much more than the datacenter's; again, possible savings very much depend on the kind of load you have.

I'd agree that unless you can really profit from having very specific hardware, you're better off renting dedicated servers than colocating servers you own. Have somebody else worry about having people on call to switch out failed hard drives.

Comparing either to AWS will inevitably lead to a much more complex discussion about spot instances, traffic costs, ancillary services etc.

100% Agree.

I don't like the idea that the only way to get developers moving is to use cloud, but I agree that it's a solid replacement for really bad ops.

What I've seen in many places is an abstraction over bare metal, some are better than others, openstack, Kubernetes on-prem, vmware etc; are all solutions that have differing amounts of adoption. Ubisoft had a lot of stuff in this area, as does Google. Ubisofts was pretty terrible though.

If you need a physical machine to be deployed, you've hit a certain level of scale and your load is much more known: and even though it can take a few weeks, what you get back is quite competitive.

But if you're waiting for hardware to get anything moving in the first place then that's obviously bad.

What I've taken to doing is prototyping on Google Cloud and then planning to migrate things to on-prem once everything is reaching maturity.

It also lets a CFO convert capex to opex (may or may not have tax implications), and you eliminate a cost center from your balance sheets (and turns it into service payments) which makes CFOs look better, even if it's net worse for the Company
CFO's don't care about CapEx vs OpEx.

At least mine doesn't seem to care even slightly, nor did my previous one.

I'm fairly certain this is an argument that we in the tech community make because we heard someone else make it.

When asking finance people blankly: having capital expenditure on the books is not a problem.

Ok. I guess I just worked for one CFO that obsessed about it, so for me n=1
Well, n=2 for me, but that's not to say you're wrong.

I reached the conclusion that it's basically a self-perpetuating myth because I asked my CFO's directly about it because it comes up often.

Curious though: where are you based? My CFO's were German and... German (but the second was working in Sweden with Swedish financial rules)

Maybe it is actually different depending on location or there's some other factor at play?

USA. So we have weird tax rules. Also I dunno the CFO could have been drinking the sales teams coolaid (we were also trying to sell enterprise to move to the cloud). And there was a lot of drinking (and hookers and probably blow) at that company
> It costs a lot of money to run your own datacenters,

If you actually do the math it is pretty much a wash vs using AWS. Yes you will pay a lot more upfront, but over a 5 year period (standard warranty length, and typical deprecation time) it pretty much evens out compared to AWS. I am sure there are many uses cases where on-prem would actually be cheaper than AWS over 5 years.

At the companies I work for the red-tape isn't nearly as bad as you make it seem (or have perhaps experienced at places you have worked). The biggest time sink right now is the ongoing supply chain issues and vendors just not having equipment, the approvals/tickets are pretty quick where I work.

If you have a little spare capacity, developers can still get hardware/software on a whim, at just a (comparatively) small one-time expense. Spare capacity is much cheaper than people make it out to be.

The upside is that it's much cheaper once you're at the scale where you no longer need to variableise your compute costs, but can tank the up-front fixed costs and do proper capacity planning.

> If you have a little spare capacity, developers can still get hardware/software on a whim, at just a (comparatively) small one-time expense. Spare capacity is much cheaper than people make it out to be.

Those people are speaking from greater experience - there are many things which seem easy but aren’t once you’re over a certain scale, and at large organizations you often have things like conflicting policies or coordinated demand (e.g. your slack capacity disappears when every project is trying to hit the same budget deadline or a change moratorium ends, a pipe breaks in building A and you need to shift a ton of previously-stable systems for 6 months, etc.).

You can do that kind of capacity planning well but it’s harder than it looks and often politically challenging because the benefits aren’t obvious. Cutting corners looks like saving money right up until it doesn’t. If you aren’t buying servers by the hundred or storage by the petabyte, you are unlikely to be competitive with a cloud service without sacrificing multiple of performance, reliability, timeliness, and security.

I think you and I are saying basically the same thing, only putting emphasis on different aspects due to our various historic experiences.

Look at your demand pattern (variable or stable, predictable or unpredictable) and what cost structures your finances can support (variable or fixed, up-front or as-you-go), pick a solution based on that, not what's cool.

Also adding your barriers to entry: staff, facilities, process, etc. It doesn’t help you saving on servers over a cloud provider if your procurement process means you have people sitting idle for 6 months.
Equinix Bare Metal. https://metal.equinix.com/

They’re a $6B revenue company that most people haven’t heard of. Their expertise is in building data centers for other data center companies.

> how far a single node machine will get you.

This. I took the wrong lesson from the DDoS attacks on Linode in late 2015 (particularly the one on Christmas Day), and the intermittent issues I encountered with DigitalOcean and Vultr in 2016 while both providers were still fairly young. A single dedicated server from a mature provider (ideally not during its hyper-growth phase) is pretty reliable.

It's not weird.

Many mega-corps are extremely bloated and dysfunctional. Their IT (Private) Cloud teams slower and less competent.

With public cloud, a small team can be fully responsible for all their resources with crystal clear cost accounting.

These megacorps likely have IT which is not mega enough to justify owning a massive datacenter.

The right scale is Amazon, Google, Facebook, Microsoft. Likely much fewer than a hundred companies in the entire world.

Every internal it department was always slower than any cloud offering.

Or laged features.

Or had underlying infra issues.

With AWS in one Startup I was able to build and maintain infrastructure were you needed a small team just 10 years before.

At a certain scale you have more negotiating power so it probably makes more sense from that perspective.
Data Residency. Scaling to dozens of global regions is not cost effective for running your own DCs.
Sorry, the next step after cloud isn't running your own DC's.

It's renting space from someone who is doing that for you.

The scale of cost savings/ownership as you scale kinda goes:

FaaS -> PaaS -> Cloud -> VPS -> Rented Hardware -> Rented space in DC -> Own DC.

Let's avoid conflating everything left of VPS with the most difficult form of it, because nobody is going there from nothing.

> what's weird though is how many mega-corps are going away from Custom Hardware in Custom Built DC towards Cloud.

Why is it surprising? Building and maintaining custom data centers is a big, slow business initiative. It takes months to years of forecasting to get the data center buildout to match the business needs, as opposed to the extreme flexibility of using a cloud provider.

> There's also something to be said for buying a VPS or a Colo machine, making sure it's backed up and dealing with the 9's that you get from that machine on it's own. I am routinely surprised by how far a single node machine will get you.

For personal projects this is exactly what I do. It’s great until something goes wrong with that one machine or VPS.

But it’s not really a good option for any business that needs consistent operations and uptime. Years ago I worked at a company that tried to self-host some of their collaboration tools on a VPS to save money over the cloud-hosted versions. When the server went down it stalled productivity for a day while the team restored a backup, with another week of confusion as we tried to find all of the things that were lost between the last backup and when the server went down.

When someone did the rough estimations on how much it cost to pay everyone’s salaries for that day of lost productivity, the number was far higher than the trivial cost savings we got from self-hosting. We also had a constant background burden on someone internally to maintain and monitor the server, plus the burden of them being on call. Often, moving to cloud anything can be a huge load off the company’s back.

> as opposed to the extreme flexibility of using a cloud provider.

I don't really buy this honestly.

What you buy with cloud providers is quality tooling, not flexibility.

If you're bin-packing with Kubernetes properly then capacity is capacity and it doesn't matter if the marketing department are using it or the developers are. You just buy a bunch of servers and when you see the load approaching 70% you buy more. It's a 2 person job.

Is it harder? Yes. Definitely.

Is it a panacea? No. Not at all.

Is it universally cheaper? Also no. Definitely not.

I feel like whenever I talk about the Cloud as an expensive thing that people get emotionally defensive.

I'm not here to take your toys away.

Services like cloud are just tools and tools always have pros and cons.

If you can't reasonably discuss the con's without resorting to "I need to hire more staff" or "its a lot better than $strawman" then we're just cargo culting.

> it’s not really a good option for any business that needs consistent operations and uptime.

Most business cases for computers can just eat the downtime honestly. Your URL redirector doesn't need 5 9's. There's a grading scale of complexity and uptime, on one side you have a single hosted server that has profoundly strong uptime (especially with the redundancies in normal servers); then you start adding complexity to get HA, and weirdly: the complexity lowers the reliability.

If you keep following the line of redundancies and HA complexity, eventually you can get to a point where the service is even more reliable than a single node. Which is what everyone assumes they will get straight away, but usually it's a lot of work to get there.

> the trivial cost savings

This will differ a lot.

I made two games, one was hybrid-cloud and one was bare metal only; the cost savings were not trivial. If we had 100% clouded the hybrid deployment we would easily have paid 10x in the hosting costs which would have been enough money to pay for 250 contractors at a premium rate.

That said: the toys were definitely shiny.

Cloud also offers flexibility in buying equipment "just in time" and returning it when projects are cancelled. You don't need 6-18 month lead time to acquire and install hardware which then gets wasted if the project gets cancelled or rescoped. Having massive capex projects converted to opex is very appealing for a lot of businesses