Hacker News new | ask | show | jobs
by toast0 2672 days ago
> As for hot code reload, I've never seen why you would need that since you can use blue / green or canary / rolling deployment, the only reason I see is to keep some state in your app, which I think is a terrible idea.

Most applications at least have connection state, at the least a TCP connection. It is at minimum disruptive to disconnect a million clients and have them reconnect. Certainly, your service environment needs to be able to handle this anyway [1] in case of node failure, but if you do a rolling restart of frontends, many active clients will have to reconnect multiple times which adds load to your servers as well as your clients. Actually disconnecting users cleanly takes time too, so a full restart deploy will take a lot longer than a hot code reload, unless you have enough spare capacity to deploy a full new system, and move users in one step, and then kill the old system.

Certainly, hot loading can introduce more failure modes, but most of those are things you already need to think about in a distributed system -- just not usually within a single node; ex: what happens if a call from the old version hits the new version.

[1] There are some techniques to provide TCP handling, but I'd be surprised to hear if anyone is using them at a large scale.

1 comments

It depends of what you mean by state, I was talking about internal state in the application. Your example is about network state like websockets not REST APIs ( what 99,9% of people use ), even with that it's easy to rollout new connections with canary deployment, and with a load-balancer in front of that your replace old instances with new one with no disruption and you can drain your old instances. Even if the connection is cut, in your client logic you should have a proper reconnection mechanism.

Hot code reload is imo a bad practice and should be avoided.

Hot code reload is imo an enabling practice, and should be done everywhere possible. Restart to reload may be useful or practically required for some deployments, and it's sort of a test of cold start, but it's so disruptive and time consuming. I've done deploys both ways, and time to remove host from load balancer, wait for server to drain, then add back is time I won't get back. You can do a lot more deploys in a day when the deploy takes seconds; which means you can deploy smaller changes, and confirm as you go.
If it's disruptive and time consuming it means you don't use the right process / tools. If you're CI/CD pipeline is properly setup ( and it's actually easy to do ) you don't have to do anything.

https://kubernetes.io/docs/tutorials/kubernetes-basics/updat...

That's the power of Kubernetes, and since it's very popular the community and tooling are great, good luck replicating that with BEAM.