Hacker News new | ask | show | jobs
by emschwartz 1033 days ago
My understanding is that Fly.io also started using Nomad but ended up running into big reliability issues at scale across many regions. I'm curious if you all are using it differently or haven't gotten to that scale yet.
2 comments

I’d say we don’t use it exactly the same way: we don’t have a single global nomad cluster, which is a critical difference.

We have one Nomad cluster per region, which we “federated” ourselves using our own orchestrator. This basically reduces the latencies between agents and each cluster, reduces the failure domains, and also avoids encoding all the constraints in one single Nomad job definition.

I'm not so much worried about scaling with our setup but the performance of the autoscaler might be a concern in the future.