Hacker News new | ask | show | jobs
by ddreier 1950 days ago
The Nomad Day N experience is pretty good. I help maintain 4 Nomad clusters running approx 40k Allocations (like Pods) for work. We have basically no problems with Nomad itself, and requires pretty much no day to day intervention. Upgrades are pretty painless too. We've gone from 0.7.x to 0.12.x with these same clusters, and will be going to 1.x soon.

Happy to try to answer specific questions.

1 comments

Do you run other services (Vault, Consul, etc.) for service discovery, configuration management, etc.?

Genuinely curious about the load of managing this on the infrastructure team.

Yep, we run the full stack. Consul for service discovery and as the storage backend for Vault. We use Vault for config, PKI, Nomad/Consul ACL auth, and we're just starting to experiment with MSSQL dynamic credentials.

Of the three systems, Vault probably takes the most of our time and effort, and that's probably only a few hours per month. We've struggled a bit with performance at least partially because the Consul backend is shared with service discovery.

All of the VMs are built and managed with Terraform using images built with Packer+Ansible. We also use the Nomad/Consul/Vault Terraform providers to apply and manage their configurations.

We have an SRE/Platform Engineering team of 12 (and hiring) that's responsible for the overall orchestration platform additionally including Prometheus/Thanos/Grafana for metrics and ELK for logs.

Hope that's helpful!