Hacker News new | ask | show | jobs
by moksly 1819 days ago
I think the articles headline is a little rude to Kubernetes. I’m by no means a fan of Kubernetes, especially not in non-tech enterprise, but the article is really about the unpredictable and rising cost of moving into the cloud that is owned by the big tech companies, isn’t it? Sure kubernetes can be part of that, but you can easily run into the same predicament without it.

The unpredictability of cost is actually the prime reason we stuck to our own cloud, where we rent (technically we buy the hardware that the company hosts, but it’s not really ours, we just use it till it breaks) the iron at a known rate. Which is just better for a public sector budget than paying by mileage, at least if anyone outside of the IT department bothers to look into what they are signing off on.

The really interesting part will be where we go from here. Moving from self-hosted to rented iron that we run our virtual servers on, was a fairly simple move that would be easy to reverse. The move into the cloud is even easier, but unless you’re careful, it could be very costly to get out.

4 comments

Unfortunately AWS is the new oracle. No one ever gets in trouble for picking it and its a great way to make it look like you as a high up exec provide value. Look how fast we are iterating now with my decision. It almost always ends in a mess of unmaintainable unthought out services that someone else has to come and clean up or move to the next proprietary service.

The last 5 years for me has been soul crushing as someone who actually enjoys managing datacenters. We have seen time and time again having your own DC leads to much better visibility and control on spending as well as lower cost. Not to mention the huge advantage when negotiating with cloud vendors if you are a mid size or up company.

So time and time again i have had to transition out of environments you can reason about into AWS and become a glorified support engineer but i guess thats what companies need now days. Someone who will read docs the other engineers dont want to and troubleshoot all the issues because AWS is so easy.

Im glad I got to learn how the “cloud” works though as i likely never would have been drawn to infra and programming in this day and age.

The problem with running your own DC is growing past your planned capacity. There's often a huge delay between developers having to put up with the VM infrastructure having to put up with under-resourced machines and more capacity being approved.

As a developer I've put up with over-subscribed VMware clouds and I vastly prefer the Azure/AWS option.

Their are a lot of bad ways to run your own hardware. Limiting your infra to EC2 for burst capacity provides an easy escape hatch if you need. However i have never run into the issue of not calculating CPU capacity properly and also would never use VM ware that sounds like IT is running your DC. I could see this being the case for some tiny startup who just owns a small amount of rack space though.

Of course you can pay the cloud providers to deal with your companies bad planning. Thats what most do.

I also would never advocate for DC for everything. As a startup it likely makes no sense to run your own hardware and also likely doesnt make sense to run k8s also however one of those is completely acceptable. Once you get to more predictable growth owning your own hardware starts to look more attractive but most don’t know how to calculate it properly and finance likes to make it merky with capex and opex buckets.

I recently experienced this firsthand, in a company which owned no computers beyond employee laptops. The product was entirely built of AWS services created by a pile of Terraform spaghetti. It was only really understood by someone whose superpower was the ability to keep an apparently unlimited number of levels of indirection in his head.

I hear they might need to move it all to Azure soon!

Terraform just does too much and is abused. Its great for simple config but quickly turns into custom modules to make things easier. Eventually those modules need to change and stuff breaks but you never know until someone trys to run it again. Then you just pray no one trys to manage their database with it.

Additionally as its put together over time if you actually had to re create the environment it would never work. I spent time automating recreating an environment and quickly stopped. Terraform is the illusion of infra as code and fails miserably at any scale.

You know you can lock modules versions, right? And of course the environment recreation needs to be tested periodically if you have a disaster recovery plan.
Locked modules only works if all of your modules are pulled in through the registry. that isn't an option at some places.

Testing environment recreation is impossible at a certain scale. Try it when you have hundreds of people adding terraform code most of who only know how to copy and paste.

> I hear they might need to move it all to Azure soon!

I'm not sure if you made that post as a joke or not...but you just described a certain San Francisco based startup that operates in the SPF/DKIM/DMARC space...

Which company (if you can say) were you describing?

Many people have been fired after AWS migrations resulted in massive cost increases. But, few people want to talk about failed projects at hospitals etc where the cloud is a poor fit.

IT has always had issues with people cargo cutting solutions without understanding the details, and the cloud is no different.

I agree its just the current one that affects me. We did have someone get fired from an AWS migration at my first startup but they just turned around and tried to do again. It took one year for a single application that only lived on a single server and we never did anything more advanced before we sold.
Are those of us happy/comfortable with current major cloud offerings simply not speaking up here? I can't fathom running a data center any longer for a company of nearly any size. Given my team's responsibilities, this would require 2-3x more headcount with significantly worse SLA/SLO if we still ran our own datacenter. Maybe it's not such a big deal for places with constant demand? Or is this just a case of observer bias?
It depends on what you do.

Cloud boxes are insanely expensive (easily 10x the price of the equivalent in house box, taking hosting, power, cooling, hw into account).

To make this work, you need a combination of variable demand, and only paying for partial salaries (your cloud boxes are mutualized with other people's boxes).

If you're a reasonably big company (tens of thousands of servers) , with fairly stable demand, and adequate capacity planning, you won't necessarily save a huge amount of money by outsourcing your DCs. You can argue that the gcp/aws guys are better than you at running fleets of servers and data centers , but at 10x the price, it's worth double checking. If all I do is raw compute 100% of the time on a very large scale, it's extremely likely I want to do it myself.

Obviously, there's more than raw hardware to the cloud, starting with all kinds of managed services, which can be worth it. Again , you'll have to do the maths :10x for the boxes, then extra for the distributed db? Does it give me a competitive advantage? Better time to market?

In the end, there are good use cases, and bad use cases for the cloud, and I don't think it's as clear cut as what you say.

EDIT : if cloud hardware prices were not completely ridiculous (say 2x), then it might suddenly be a lot more compelling, and I would most likely agree with you (security / regulatory issues aside).

I think you can find better cloud prices (outside of aws) that make sense. Running a datacenter is hard, running your own servers (software only) is much easier.

I can get this dedicated server in Germany (courtesy of hetzner.com) for 40 eur / 47 usd per month (albeit usd is relatively weak right now): CPU: Intel® Core™ i7-6700 Quad-Core RAM: 64 GB DDR4 Drives: 2x 512 GB NVMe SSD

Building an equivalent computer would cost me around 1300 eur / 1540 usd without VAT (assuming the RAM is ECC).

Let's assume a 12.5 eur / 15 usd of electricity per month and the computer will pay itself in 2 years. It's hard to price maintenance, issues and hardware failures, but I feel like this computer could last 4-5 years, making the price roughly 2x.

When we pushed a bunch of teams at a large retailer from ec2 to kubernetes in ec2 (not eks) we reduced overall cloud spend by ~30%.

It's only unpredictable and unquantifiable if you don't look at it. Is the problem that they don't have the right tools to look at it yet?

The flip side of this is I was able to reduce an employers cloud spend by 80 percent by just _fixing_ the mistakes they made when migrating to kubernetes. We were still on k8s afterwards, its just that the first pass made a bunch naive assumptions and poor optimizations.
Renting the hardware is not necessarily a cost-saving measure though: how much of the compute/storage capacity you have is sitting idle in your datacenter? That's the whole point of finops: you need to have full visibility into the usage of your infrastructure so you can optimize the spend.
OP didn't say anything about cost saving but rather predictability.
Gosh, there is job title for capacity planning?
do capacity planners buy and sell capacity to scale up and down month to month? :)
When does the need ever go down?
If you're retail, it goes down after xmas. If you're a tax company, it goes down after may. If you sell a product, it goes down when you go long enough without releasing anything new
At night, often. For example, I have had a use case where we needed a 1000 node build farm during the day when developers were working, but only 50 at night. Machine learning jobs are another common source of workloads that need burst capacity.
What company has constant load 24/7/365?
I’m going to guess Visa.
It is not necessarily a cost-saving measure, but it can be in some cases. In a project I was involved with, we came to the conclusion that we would pay AWS every three month the cost of the hardware that we would by ourselfs. I am aware that AWS includes hosting and services. Nevertheless that is a very big difference.
Still amazing to me that keeping capacity in reserve is now demonized