Hacker News new | ask | show | jobs
by tutfbhuf 564 days ago
I have experience running Kubernetes clusters on Hetzner dedicated servers, as well as working with a range of fully or highly managed services like Aurora, S3, and ECS Fargate.

From my experience, the cloud bill on Hetzner can sometimes be as low as 20% of an equivalent AWS bill. However, this cost advantage comes with significant trade-offs.

On Kubernetes with Hetzner, we managed a Ceph cluster using NVMe storage, MariaDB operators, Cilium for networking, and ArgoCD for deploying Helm charts. We had to handle Kubernetes cluster updates ourselves, which included facing a complete cluster failure at one point. We also encountered various bugs in both Kubernetes and Ceph, many of which were documented in GitHub issues and Ceph trackers. The list of tasks to manage and monitor was endless. Depending on the number of workloads and the overall complexity of the environment, maintaining such a setup can quickly become a full-time job for a DevOps team.

In contrast, using AWS or other major cloud providers allows for a more hands-off setup. With managed services, maintenance often requires significantly less effort, reducing the operational burden on your team.

In essence, with AWS, your DevOps workload is reduced by a significant factor, while on Hetzner, your cloud bill is significantly lower.

Determining which option is more cost-effective requires a thorough TCO (Total Cost of Ownership) analysis. While Hetzner may seem cheaper upfront, the additional hours required for DevOps work can offset those savings.

7 comments

This is definitely some ChatGPT output being posted here and your post history also has a lot of this "While X, Y also does Z. Y already overlaps with X" output.

I'd like to see your breakdowns as well, given that the cost difference between a 2 vCPU, 4GB configuration (as an example) and a similar configuration on AWS is priced much higher.

There's also https://github.com/kube-hetzner/terraform-hcloud-kube-hetzne... to reduce the operational burden that you speak of.

It is my ouput, but I use ChatGPT to fix my spelling and grammar. Maybe my prompt for that should be refined in order to not alter the wording too much.
While using ChatGPT for enhancing your writings is not wrong by any means, reviewing the generated output and re-editing when necessary is essential to avoid robotic writing style that may smell unhuman. For instance, these successive paragraphs: "In contrast, using AWS.." and "In essence, with AWS.." leaves a bad taste in your brain when read consecutively.
I agree with you, failed on that one
> I use ChatGPT to fix my spelling and grammar

I have a better suggestion, which will save time, energy, money, and human work.

Don't.

Write it yourself. If you can't, don't post.

Why would you want to restrict contributions from people with relevant experience and willingness to share, just because the author ran a spelling and grammar check?
Unless the spelling and grammar is HORRENDOUS people won't really care. Bad English is the words most used language, we all deal with it every day.

Just using your browser's built-in proofreader is enough in 99.9% of the cases.

Using ChatGPT to rewrite your ideas will make them feel formulaic (LLMs have a style and people exposed to them will spot it instantly, like a code smell) and usually needlessly verbose.

You can tell it's AI when it refuses to take a side and equivocally considers issues first on one hand and then the other hand, but can't get the number of fingers right.

Or as ChatGPT would put it:

Precise grammar and spelling are undeniably important, but minor imperfections in English rarely obstruct communication. As the most widely used language in the world, English is highly flexible, and most people navigate small errors without issue. For the majority of cases, a browser’s built-in proofreader is entirely sufficient.

On one hand, tools like ChatGPT can be valuable for refining text and ensuring clarity. On the other hand, frequent reliance on such tools can result in writing that feels formulaic, especially to those familiar with AI-generated styles. Balancing the benefits of polished phrasing with the authenticity of your own voice is often the most effective approach.

It’s overkill for this audience. HN is pretty forgiving of spelling and grammar mistakes, so long as the main information is clear. I’d encourage anyone that wants to share a comment here to not use an LLM to help, but just try your best to write it out yourself.

Really - your comment on its own is good enough without the LLM. (And if you find an error, you can always edit!)

If we really wanted ChatGPT’s input on a topic (or a rewording of your comment), we can always ask ChatGPT ourselves.

Everyone claims it’s a spelling and grammar check, but it’s the OP trying to spread “we tried running self-managed clusters on Hetzner and it only saved us 20% while being a chore in terms of upkeep” into a full essay that causes all that annoying filler.

You’d assume people would use tools to deliver a better and well composed message; whereas most people try to use LLMs to decompress their text into an inefficient representation. Why this is I have no idea, but I’d rather have the raw unfiltered thought from a fellow human rather than someone trying to sound fancy and important.

Not to say I still find the 20% claim a little suspect.

You do realize it wasn't "saved us 20%" but "Hetzner can sometimes be as low as 20% of an equivalent AWS bill" ie saved 80%?
While I agree that your characterisation is true for a lot of chatgpt output, it can also be true for a human explaining their nuanced point of view.
Most humans don't say a couple sentences and then re-summarize them 3 more times unless they are speaking to someone with a learning disability.
I've never operated a kubernetes cluster except for a toy dev cluster for reproducing support issues.

One day it broke because of something to do with certificates (not that it was easy to determine the underlying problem). There was plenty of information online about which incantations were necessary to get it working again, but instead I nuked it from orbit and rebuilt the cluster. From then on I did this every few weeks.

A real kubernetes operator would have tooling in place to automatically upgrade certs and who knows what else. I imagine a company would have to pay such an operator.

This.

I run BareMetalSavings.com[0], a toy for ballpark-estimating bare-metal/cloud savings, and the companies that have it hardest to move away from the cloud are those who are highly dependent on Kubernetes.

It's great for the devs but I wouldn't want to operate a cluster.

[0]: https://www.BareMetalSavings.com

That's just not how it works on any scale other than "toy"
Right, but certs get out of date unless somebody does something about it, that was my point.
Ceph is a bastard to run. Its expensive, slow and just not really ready. Yes I know people use it, but compared to a fully grown up system (ie lustre[don't its raid 0 in prod] or GPFS [great but expensive]) its just a massive time sync.

You are much better off having a bunch of smaller file systems exported over NFS make sure that you have block level replication. Single address space filesystems are ok and convenient, but most of the time are not worth the cost of admin to get reliable at scale. like a DB shard your filesystems, especially as you can easily add mapping logic to kubernetes to make sure you get the right storage to the right image.

I saw that Hetzner is beta testing ceph-based object storage. This could make the setup much easier. Anyone tested this already?
I agree that it is hideously complicated (to anyone saying “just use Rook,” I’ll counter that if you haven’t read through Ceph’s docs in full, you’re deluding yourself that you know how to run it), but given that CERN uses it at massive scale, I think it’s definitely prod-ready.
Oh it probably is prod ready, I just wouldn't use it unless I had to (ie I had the staff to look after it and no money to buy something better)

whether is a good fit for general purpose storage of stuff at a small scale is harder question. Its not easy to get good performance at small scale, and to get good performance requires a larger than you'd like number of storage nodes.

Yes it has inline FEC, (https://www.ibm.com/docs/en/storage-ceph/7?topic=components-...) but its lots of layers to get to a file system.

Personally I'd have a redundant array of storage nodes and be done with it. Its easier to debug a single server than 3 layers of ceph weirdness.

I mostly agree, but it surprises me that people don't often consider a solution right in the center, such as openshift. Have a much, much less burden for devops and have all the power and flexibility of running on bare metal. It's a great hybrid between a fully managed and expensive service versus a complete build your own. It's expensive enough. Todd, for startups it is not likely a good option, but if you have a cluster with at least 72 GB of RAM or 36 CPUs going (about 9 mid size nodes), you should definitely consider something like openshift.
Manually updating k8s clusters is a huge tradeoff. I can’t imagine doing that to save a couple bucks unless I was desperate
I dunno, I've had to spend like two or three hours each month on updating mine for its entire lifetime (of over 5 years now), and that includes losing entire nodes to hardware failure and spinning up new ones.

Originally it was ansible, and so spinning up a new node or updating all nodes was editing one file (k8s version and ssh node list), and then running one ansible command.

Now I'm using nixos, so updating is just bumping the version number, a hash, and typing "colmena apply".

Even migrating the k8s cluster from ansible to nixos was quite easy, I just swapped one node at a time and it all worked.

People are so afraid of just like learning basic linux sysadmin operations, and yet it also makes it way easier to understand and debug the system too, so it pays off.

I had to help someone else with their EKS cluster, and in the end debugging the weird EKS AMI was a nightmare and required spending more time than all the time I've had to spend on my own cluster over the last year combined.

From my perspective, using EKS both costs more money, gives you a worse K8s (you can't use beta features, their ami sucks), and also pushes you to have a worse understanding of the system so that you can't understand bugs as easily and when it breaks it's worse.

if the "couple of bucks" ends up being the cost of an entire team, then hire a small team to do it.

Then get mad at them because they don't "produce value", and fold it into a developers job with an even higher level of abstraction again. This is what we always do.

We at https://syself.com have made a platform with "one-click updates". 100% vanilla Kubernetes on Hetzner.
The "couple bucks" in my experience were difference between viable business and bankrupt one - including time spent on maintaining k8s!
> Determining which option is more cost-effective requires a thorough TCO (Total Cost of Ownership) analysis. While Hetzner may seem cheaper upfront, the additional hours required for DevOps work can offset those savings.

Sure, but the TLDR is going to be that if you employ n or more sysadmins, the cost savings will dominate. With 2 < n < 7. So for a given company size, Hetzner will start being cheaper at some point, and it will become more extreme the bigger you go.

Second if you have a "big" cost, whatever it is, bandwidth, disk space (essentially anything but compute), cost savings will dominate faster.

Not always. Employing Sysadmins doesn't mean Hetzner is cheaper because those "Sysadmin/Ops type people" are being hired to managed the Kubernetes cluster. And Ops type people who truly know Kubernetes are not cheap.

Sure, you can get away with legoing some K3S stuff together for a while but one major outage later, and that cost saving might have entirely disappeared.

More than that: the more you use, the more discounts you can get from a major CSP, which would also reduce the TCO for using a managed service.
Even a short outage can completely wipe out any savings.
Is it just me or do the last 3 paragraphs feel like ChatGPT output?
I used GPT4o to fix all my spelling and grammar mistakes, maybe it went a little too far, but this is 100% my comment
> this is 100% my comment

No, it is not.

Isn't the point of ChatGPT to mimic sentences written by humans?
Kind of. But which humans? It's a bit like how the average person doesn't exist, except in the LLM world, now it does.
GPT-4 is, but ChatGPT is fine-tuned to emit sentences that get rated well (by humans, and by raters trained to mimic human evaluation) in a conversational agent context.
Yeah, I was wondering the same thing.