Hacker News new | ask | show | jobs
by toast0 3136 days ago
So, there's maybe two parts to your question? How do you structure your code to make it possible to hot load code -- and how does that help you recover from crashes.

The beam VM allows for an old version and a current version of all modules. When you call into a function with a fully qualified name (Module:Function), it always calls into the current version; if you call a function within a module only by its function name, it calls into the same version that is executing, which could be the old version. So, you need to periodically (or on demand via some message) make a fully qualified call, to ensure your process will migrate. You also need to make sure the old version doesn't stay on the stack, so you have to be tail recursive, at least sometimes. You also need to make sure you make your new code able to cope with state developed by old code, which can be challenging at times.

If your service is generally stable, but occasionally crashes with some types of requests, then you're in a good place. If something is crashing a lot, it can cascade into a supervisor crash, and it is likely that you will have a bad day. In theory, when your service starts (started by you, or if the supervisor restarts it), it has a consistent state, and will be able to service requests; but often it started crashing because some service it requires stopped working right, and restarting the client doesn't really help.

I've found let it crash is a good philosophy, but shouldn't always be implemented literally. In an http server, I'd rather catch crashes, log them and return an error to the client -- not just close the socket. In erlang server processes that don't maintain much state running in pg2, it's better to catch and log, because requests are going to be lost if you actually crash.