Hacker News new | ask | show | jobs
by 3np 1957 days ago
We moved from k8s to Nomad at my workplace, and I'm currently running almost all my self-hosted software on a 10-node Nomad cluster (with Consul and Vault) as well. The servers for each of the three gives plenty of headroom resource-wise when run on any recentish arm64 SBC, so you can get an HA cluster for not expensive.

If you integrate properly with them (which does take quite a bit of work with the ACL policies and certs), it really starts shining. With terraform, of course.

For these core services themselves and other OS packages, I use ansible, mostly because of the huge resources in the community.

It's fun and doesn't come with all the cognitive overhead of k8s. I'm a fan and will tell everyone they should consider Nomad.

It's obviously less mature, though. One thing that has been frustrating for a while is the networking configuration - a simple thing like controling which network interface a service should bind to (internal or external?) was supposedly introduced in 0.12 but completely broken until 1.0.2 (current version is 1.0.3).

Consul Connect is really awesome conceptually to make a service mesh, but is also just coming together.

There are really only two things I miss dearly now:

1) exposing consul-connect services made by Nomad (aka ingress gateway). It seems to be theoretically doable but requiring manual configuration of Connect and Envoy. If you want to expose a service from the mesh through e.g. an http load balancer, you need to either expose it raw (losing the security benefits) or manually plumb it (no load balancer seems to play nicely with connect without a lot of work, yet)

2) Recognize that UDP is a protocol people actually use in 2021. This is a critique of the whole industry.

2 comments

What Ansible resources do you use for Nomad, Consul and Vault though? I've found a few but they all seem to lag behind release. Noone is up to date at Nomad 1.0.3. Would be nice with some half-way standard way of setting it up, like k8s has quite a few projects that do.
I'm running a Traefik instance on each node, so that I can expose a service by adding a bunch of labels. The load balancer is not part of the cluster and routes the traffic to the nodes. You might want to consider that too :)
I'm doing exactly that, actually!

Two bastion hosts/lbs sharing a virtual IP (keepalived), with two Traefik instances each (private and public). I actually schedule them through Nomad (on the host network interfaces) as well - since they solved the host networking issue I mentioned above it's properly set up with service checks. Super smooth to add and change services with consul catalog, and ACME-TLS included.

Things I don't like that make me want to try envoy instead:

* It eats A LOT of CPU once number of peers ramp up - there's an issue on their GH that this seems to have been introduced in 2.1.

* UDP is completely broken. For the time being I'm doing per-job nginxes for that until I have a better solution.

* It's very brittle. Did you ever misspell the wrong thing in your label? If so you probably also wondered why half of your services all stopped working as Traefik arbitrarily let it hijack everything.

* The thing I metioned above with Consul Connect. Each service can integrate with either but not both.

It was great for months though, but I guess I grew out of it just by the time I started properly understanding how all the configuration actually works (:

I recently setup Traefik 2.x to front a self-hosted Docker Registry, with automated Let's Encrypt renewals - I found the config to be really unintuitive and confusing! It feels like an awful lot of really finicky config for such a simply setup. Next time I'll try something else.
Traefik v1 is much simpler, v2 seemed to introduce so many extra layers which makes the simple stuff harder.
Caddy v2 seems to be doing something right although I don't think it comes with the same number of features out of the box. Plus it's more of http reverse proxy.