Hacker News new | ask | show | jobs
by lomnakkus 3265 days ago
> It's not a superset until it has a non-sharable memory heaps between threads, complete and easy hot code reloading, dynamic tracing (being able to log into a node and update code at will as it the application is running).

The first two I can definitely see -- particularly for robustness and debugging -- but I'm a bit surprised by the last one. Do people actually really log into running production systems and update code like this? It seems like it would be an incredibly dangerous thing to do. (Akin to using direct DB connections and typing in DELETE statements directly rather than e.g. putting them in SQL scripts first.) It could potentially also make it extremely hard to know what's actually running in production.

1 comments

> Do people actually really log into running production systems and update code

Yes, I've done it once in a while. Cases could be is to deploy a fix and the customer's system is up and running. If say it's something urgent that can't wait until it goes through the full deployment pipeline. Because hot code reloading works so well in Erlang it's not risky as doing it in Java for example.

In fact upgrading by hot code reloading is also a common thing Erlang world. So there are cases where it is done routinely. It takes some preparation and so on:

http://learnyousomeerlang.com/relups

Another case is if you see an issue happening but don't have enough logging or tracing ability in that part of code. You can upgrade the code with an additional log statement or save extra info to a file for debugging. Then remove the patch. The alternative is to try to replicate that on a separate system which sometimes might not be easy - don't have the exact access pattern, exact data and other factor that that would duplicate the original environment.

But you're right doing it haphazardly and just sprinkling hot patched code updates everywhere is a path to disaster. So it's possible to monitor and record these updates to them them visible and managed better. It's up to the team / organization to handle that.

The bottom line don't do it routinely, but when you have it can really save the day. And it's something that many (most!) frameworks / runtimes / languages don't support as well as Erlang does.

You mentioned tracing. I'd like to expand on that a little.

What many people may be interested to know about Erlang is that you can log in to a production system, start a new shell running a tracer that listens on a localhost TCP socket[1], and use the dbg module in the production VM to trace calls and messages (and more) between any functions in any processes - in the running production system - and send them to the tracer node.

Done judiciously, the overhead is negligable, and the benefits are great. You can zoom in on bugs in real time.

I find the syntax of dbg match specs to be ugly, but it has saved my bacon so often it is so worth it, and it doesn't get mentioned that much, even though I feel it is almost as much a superpower as hot code loading.

[1] You use the separate shell to avoid accidentally crashing the production VM; if you do something boneheaded in the port-based shell, you can kill it and the production VM will just stop sending trace data to the dead TCP socket.

Ah, right, I guess the "add logging" case is a pretty compelling and not-too-scary (for my sensibilities) one. Point also well taken about it being safer because of the shared-nothing nature of processes.

Thanks for the perspective!