| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ericbarrett 1957 days ago
	The selling point of GKE etc. is “minimal to no maintenance,” but of course somebody else is doing the maintenance and the customer is paying a premium for it. Says great things about Nomad.

1 comments

omneity 1957 days ago

Yeah, when making the decision it was quite harrowing to think of maintaining a cluster in production. Nomad had very little operational complexity compared to what we imagined.

We've had two main outages in months:

- Server disks were filling up and we hadn't set up monitoring properly at the time (ironic for the name of our company :) ). Not Nomad's fault.

- A faulty healthcheck caused all the servers of a cluster to restart at the same time, which caused complete loss of the cluster state (so all the jobs were gone. I like to call it a collective amnesia of the servers).

We're still looking for a good/reliable logging and tracing solution though. Nomad has a great dashboard, but only with basic logging, and it only gets you so far.

Overall, would recommend again!

link

sofixa 1957 days ago

Jaeger is pretty great for tracing, and can integrate with Traefik/Envoy ( or whatever you use for ingress/inter-service communication).

We're running Loki for the logs ( via nomad log forwared/shipper and promtail) and so far it's going great. I'll have to do a write-up about the the whole thing.

link

omneity 1957 days ago

Thank you for the pointers, very helpful. I'd love to see that write up too!

link

mr-karan 1957 days ago

I'd love to see your write-up on thr logging thing. Please do!

link

davestephens 1957 days ago

Would love to see that write-up!

link