Hacker News new | ask | show | jobs
by jcrites 4194 days ago
Interesting technique! I can see this being useful in applications that are single points of failure. In redundant systems, however, I have found it quite effective and generally prefer to solve this problem upstream of the application, in the load balancer, by routing traffic around machines during each machine's deployment.

First step of a deployment: shift traffic away from the machine, while allowing outstanding requests to complete gracefully. Next you can install new software or undertake any upgrade actions in isolation. This way any costs involved in the deployment don't impair the performance of real traffic. Bring the new version up (and prewarm if necessary). Finally, direct the load balancer to resume traffic. We call the general idea "bounce deployments", as a feature of the deployment engine.

Two advantages of having a general-purpose LB solution:

(1) You can apply it to any application or protocol, regardless of whether the server supports this type of socket handoff. Though to be fair, some protocols are more difficult to load balance than others - but most can be done, with some elbow grease (even SSH).

(2) It's possible to run smoke tests and sanity tests against the new app instance, such that you can abort bad deployments with no impact. Our deployment system has a hook for sanity tests to be run against a service after it comes up. These can verify its function before the instance is put back into the LB, and are sometimes used to warm up caches. If you view defects and bad deployments as inevitable, then the ability to "reject" a new app version in production with no outage impact is a great safety net. With the socket handover, your new server must function perfectly, immediately, or else the service is impaired. (Unless you keep the old version running and can hand the socket back?)

(By LB I don't necessarily mean a hardware LB. A software load balancer suffices as well - or any layer acting as a reverse proxy with the ability to route traffic away from a server automatically.)

A technique like this would also be useful for implementing single-points like load balancers or databases, so that they can upgrade without outage. Though failover or DNS flip is usually also an option.