| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by hvindin 2518 days ago

Where I am at the moment we're running clusters of 400-800 containers sitting behind nginx instances and even thought we own nginx+ licenses, we've found the consul-template + SIGHUP route to be totally fine, even at a churn of maybe a dozen contained a minute everything still seems to be working fine. If a particularly busy node dies then we occassionally see a few requests get errors back, but Nginx's passive healthchecking (ie. checking response codes and not sending traffic to an upstream with a ton of 500's being returned) seems to handle all of that ok.

The only time our tried and tested consul-template + SIGHUP method is every unsuccessful (and we've ended up jusy having to put processes in place to stop this) is if we have the same nginx handling inbound connections to the cluster under high load and we try and respawn all the containers at once. Then things start to go wrong for 5 minutes or so then back to normal.

While "the occasions error response" isn't perfect, I suspect that for most use cases it's good enough, so I'd still be interested in knowing more specifically what happened to that nginx...