Hacker News new | ask | show | jobs
by busterarm 2 days ago
Nomad, Consul and Vault all running on VMs that you manage with Terraform.

The problem is that when you run this long enough you want K8s features anyway.

1 comments

And your starter “production” deployment of the Nomad/Consul/Vault stack is literally 12 VMs, comprising three independent Raft clusters. There is no decent way to do zero-downtime instance replacement without building your own orchestration layer, but also they’ve had a years-long track record of shipping bad upgrades and following up with only manual remediations or workarounds instead of a fix.

As someone who has productionized and maintained truly hundreds of those clusters across several jobs, it is hard at this point for me to recommend Consul, Nomad, or Vault to anyone serious about building reliable applications. Too many broken upgrades and manual click-ops tasks just to keep them online. (…and I’ve said nothing of the actual product!)

This is a timely post. We are going to use Consul to replace the need for Internal Load Balancers. What issues do you have with it?
I'm in a similar boat and only somewhat agree. The gist of my post was that this exists but maybe just use Kubernetes anyway.

I don't entirely agree with your statement about zero-downtime instance replacement though. We built our terraform around doing one-at-a-time instance replacement and removing/adding nodes in Hashicorp Raft clusters is pretty much the easiest thing I've ever done with infrastructure.

That's really always been the biggest selling point around Hashicorp's stuff for me. They made bootstrap and maintenance operations easy enough that a caveman could do it. Even recovering from problems isn't terribly hard unless you're already doing something stupid (Roblox outage).

I also have deployed and managed _hundreds_ of these over the last 8 years or so and I'm not really having the same problems that you do. But we don't upgrade to the latest and greatest because it _does_ take them a few versions to get their feature launches correct. This is mainly a Nomad problem now though -- consul and vault are pretty brainless to operate.

Still though, we _also_ use Kubernetes and I prefer it. Most of our software engineers don't though because they don't actually want to take the time to understand it, they just want to run binaries and forget about it.