Hot code reloading with Erlang | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	Hot code reloading with Erlang (medium.com)
	73 points by kansi 3852 days ago

7 comments

skrebbel 3851 days ago

I wonder what HN's devops people think about this wrt the current trend of containers and immutable infrastructure. Hot code reloading seems to be directly at odds with the idea of immutable architecture, because essentially the application code becomes state. So your container becomes stateful, instead of swapping out your old appserver container by a new one.

What's your opinion? Ditch Docker and put the Erlang VM on the host OS? Ditch hot code loading and swap containers the usual way? Some middle ground?

greenleafjacob 3851 days ago

Hot code reloading is always more work than just blue-green and should be avoided if necessary. For example, author of Learn You Some Erlang writes [1]:

> if you can avoid the whole procedure (which will be called relup from now on) and do simple rolling upgrades by restarting VMs and booting new applications, I would recommend you do so.

Erlang grew out of the challenges faced by telecoms industries such as what do you do when blue-green isn't an option? Think an in-use packet switch that is the only point of contact between two networks. No way to take the switch down for maintenance without some interruption in service, which gets messy when dealing with timeouts. In the Armstrong thesis paper he gives another example [2]:

> Usually in a sequential system, if we wish to change the code, we stop the system, change the code and re-start the program. In certain real-time control systems, we might never be able to turn off the system in order to change the code and so these systems have to be designed so that the code can be changed without stopping the system. An example of such a system is the X2000 satellite control system developed by NASA.

This power comes at a cost, though. LYSE again:

> It is said that divisions of Ericsson that do use relups spend as much time testing them as they do testing their applications themselves. They are a tool to be used when working with products that can imperatively never be shut down.

The point being, hot code reloading is an additional feature that can come in handy but for most of HN's audience probably won't be relevant; the cost outweighs the benefits of just blue-green deploying it.

[1] http://learnyousomeerlang.com/relups#the-hiccups-of-appups-a... [2] http://www.erlang.org/download/armstrong_thesis_2003.pdf

stingraycharles 3851 days ago

On the contrary, the implementation of Erlang's hot code reloading forces state to be separated from the code. If you look at Erlang's gen_server, every call requires you to return a new State object, which is passed to the next function call.

In other words, you can compare the Erlang's virtual machine with a container itself, and everything old is new again!

jlouis 3851 days ago

Hot code reloading is best used for a scenario where you cannot afford to restart your system, usually because it drags a lot of internal state around and reconstructing that state is expensive.

Typical use cases include several gigabytes of in memory state which takes a long time to read in and get hot when redeploying or a large amount of long-running TCP connections.

For most other uses, we just do rolling upgrades in Erlang as everyone else is doing. It is somewhat simpler to get to work, and immutable architecture is to a certain extent easier to manipulate.

dm3 3851 days ago

You have to start with the problem. If your problem is solved by highly available and stateful services, then the Erlang VM on the host seems like a good idea. If the availability doesn't matter that much or the services can be made stateless without much pain - go for the containers.

Ixiaus 3851 days ago

As always: depends on the use case.

We're using Erlang as the primary language environment for our IoT product for a lot of reasons but one big one is: Hot code loading and a very robust release upgrade environment with a lot of control over the process (including restarting everything inside the VM if that's what we wish to do).

For our product, a digital light switch / dimmer, high uptime guarantees is a very important requirement and Erlang has it all plus many other wonderful features.

Fuddh 3851 days ago

Attended a talk by one of the creators of Erlang a couple of weeks ago. Very passionate about achieving maximum uptime for applications written in his language. This is one of the features that makes that possible... Fascinating stuff.

stevegh 3851 days ago

Interesting. The hot code reloading functionality in Erlang led me to investigate ruby (my preferred Dev language) a bit more.

You can do a hot code load in Ruby using the Kernel#load() call. It won't alter functionality currently on the call stack, but it will change the functionality of everything not on the call stack. With some sympathetic design, you can achieve hot code loading fo high availability in ruby.

pmontra 3851 days ago

You can use that to replace code by monkey patching

    $ cat hi.rb 
    def method
      puts "hi"
    end
    method
    load("hello.rb")
    method

    $ cat hello.rb 
    def method
      puts "hello"
    end

    $ ruby hi.rb 
    hi
    hello

You must engineer your application to execute the load method and that's it. However I wonder if this is really equivalent to what Erlang does. I remember http://rvirding.blogspot.it/2008/01/virdings-first-rule-of-p...

pmontra 3850 days ago

Interesting post at http://blog.rkh.im/code-reloading

amelius 3851 days ago

> Hot code loading is the art of replacing an engine from a running car without having to stop it.

Except you can clone the car into a controlled environment, and test the whole procedure, before doing the actual replacing.

guiomie 3851 days ago

This is cool. What type of scenarios could you not afford a few seconds of downtime on a server? For example, why not simply remove a machine from the cluster/nlb and upgrade it, then add it back ...?

yetihehe 3850 days ago

When your server has several gigs of state. It's VERY useful on a dev server. Instead of waiting several minutes for reload, I just load in new code manually (typically I change only 1-2 files per reload). If something breaks - hey, it's only dev server. Erlangs other feature - almost everything works alone - helps with this. If something breaks, it breaks only in one place, so most of the time I only need to make small changes and reload once more. Rest of the system does what it needs without any downgrades.

simoncion 3850 days ago

> Instead of waiting several minutes for reload, I just load in new code manually (typically I change only 1-2 files per reload). If something breaks - hey, it's only dev server.

Someone wrote a module for elixir that uses inotify (and similar) to -I think- watch .beam files for modification and perform the required hot-reloads automatically.

I would be reluctant to run this in production, and I can see situations (even in development) where this could trigger unwanted code purging and would be disastrous, but it's a pretty neat thing to have and -it seems- a must for Web Dev people.

toast0 3849 days ago

This would be terrible in production -- often there's an order you need to load the beam files, and I wouldn't want to add that to the compile step, it's very simple to load in the proper order. You could pretty easily use code:soft_purge/1 prior to loading to avoid killing lingering processes though, and then it would probably be reasonable for development.

simoncion 3849 days ago

> This would be terrible in production...

Yeah it could be. Frankly, I'd likely reach for Erlang Releases before I reached for this when updating software in production.

However, for a large variety of dev work, this automatic module reloading thingie works pretty well. :)

> You could pretty easily use code:soft_purge/1 prior to loading to avoid killing lingering processes though...

Mmm. Okay. So, I'm not 100% on how this works, so please bear with me and my inaccurate terminology. :(

In any given Erlang system, there can be two versions of a module running, the "current" one, and the "old" one, right?

So, if you call code:soft_purge/1 when there is no "old" code loaded, it should return true, yes? (In addition to returning true when there's no process running the "old" code.) [0]

So, would this be a way to write an auto-loader that doesn't purge in-use code?

* code:soft_purge(?MODULE)

* if false, wait a while then retry

* if true, code:load_file(?MODULE)

I guess maybe you'd want to build up a list of all the modules that have been modified, and wait until code:soft_purge/1 returns true for all of them before loading the modules. (maybe.)

You also -obviously- want an override that allows for the purging of in-use code.

[0] Testing indicates that it does, but it's often good to double-check. :)

toast0 3848 days ago

Yes, you've got the concepts and implementation correct.

The exact strategy for reloading (wait for all at once, load whatever is ready, how long to wait, etc), left as an exercise for the reader. For dev, I use a function in the shell that loads everything that changed (no soft purge), for prod, i have a function that goes in order and checks soft purge, then loads (if the 2nd module doesn't soft purge, it will have already loaded the first module, but it will stop before trying the 3rd).

With most things in gen_server's, there's not a lot of opportunity for lingering code, but sometimes it happens.

jgalt212 3851 days ago

You know what would be really amazing is if you could restart the Erlgang VM, or load new a VM without interrupting any of the running code modules.

dozzie 3851 days ago

That's what distributed Erlang is for.

simoncion 3851 days ago

What -exactly- do you want to do when you say you want to restart the Erlang VM?

I'm asking because I don't have enough context to know why you want to do what you're asking to do.

toast0 3849 days ago

Not the original asker, but it would be nice to be able to upgrade between OTP releases, without having to restart my application (but I have no expectation of that being possible ever when there are VM changes, and unlikely for the effort to be spent to do it for the user-space erlang bits either). I have to use DNS for load balancing [1] and big mnesia tables, so I have to wait a long time for traffic to drain, and then another long time for the application to start back up.

Working for 4 years in an Erlang environment where hotloading is the norm, makes me wish for it everywhere! Why do I have to reboot to fix kernel bugs in tcp? :(

[1] the load balancers I have access to where we host had more downtime than our hosts, so not actually helpful

simoncion 3849 days ago

ISTR that, you can reload core Erlang modules, but that there's some sticky_directory stuff that prevents it from happening by default.

I'm pretty sure that I can reload the inet module: [0]

    Eshell V7.0  (abort with ^G)
    1>  l(inet).
    {error,sticky_directory}
    
    =ERROR REPORT==== 6-Dec-2015::03:37:00 ===
    Can't load module 'inet' that resides in sticky dir
    2> code:which(inet).
    "/usr/lib/erlang/lib/kernel-4.0/ebin/inet.beam"
    3> code:unstick_dir("/usr/lib/erlang/lib/kernel-4.0/ebin/"). 
    ok
    4> l(inet).
    {module,inet}
    5>

Not that this is a good idea, mind, but I'm fairly certain that it's doable. :)

(Also note that you can reload the mnesia module without hassle. Its ebin directory is not marked as sticky. :) )

> I have to use DNS for load balancing [because the load balancers fall over often].

Oh lord. That's a terrible situation to be in.

[0] Which is part of the kernel application, which is one of the applications that hot upgrades require that you restart the emulator to upgrade. [1]

[1] http://www.erlang.org/doc/system_principles/upgrade.html

toast0 3848 days ago

Yes, you can reload all the beam files (even if you have to unsticky them -- you can do that), but that doesn't mean r16 -> r17 didn't make changes you can't safely load. Some parts of OTP have state upgraders, and some don't; some may depend on bif/nif changes as well.

>> I have to use DNS for load balancing [because the load balancers fall over often].

>Oh lord. That's a terrible situation to be in.

It's not terrible, it's just not great. Our hosting environment is generally very reliable, so if I don't screw things up, my systems won't fall over. It's just their loadbalancers are crap, the suggested upgrade path was run a load balancer in a VM appliance; which seems like maybe I should just run CARP myself on the hosts (or something), instead, and skip a layer, but I'll probably never get around to that, because it doesn't come up that often :)

simoncion 3848 days ago

> ...but that doesn't mean r16 -> r17 didn't make changes you can't safely load.

Oh yeah. I know. I would expect for some-to-many minor things (like the TCP bug you mentioned), this wouldn't be an insane way to do a hot upgrade. Guess one would need to make a close study of the diffs.

Regardless, I mentioned the fact that this isn't a good idea for any novice who might stumble across this comment months or years hence and get the wrong idea. :)

> ...maybe I should just run CARP myself on the hosts...

Oh, CARP is so cool. It's a bit of a pity that ucarp doesn't support IPv6. But, how does CARP replace a load balancer? Isn't it used for single host availability, or is my understanding too narrow?

jgalt212 3850 days ago

I would assume the longer any VM is running the higher the chances of a service degrading. I guess this is mostly due to memory leaks or bit rot. I have no Erlang VM experience, so my comment was geared towards VMs in general.

simoncion 3850 days ago

> I guess this is mostly due to memory leaks or bit rot.

"Bit rot"? The only defense for the bit rot I'm aware of is ECC RAM.

Anyway. AFAIK (and I'm no Erlang expert, so there's probably something pertinent that I don't know) unless there's a resource leak in core Erlang code, resource leaks can be fixed by restarting the leaking application, or killing the leaking process. [0]

[0] Erlang software is often broken up into Applications. [1] An application is a collection of code with a well-known entry point that (ideally) does a particular thing. An application can depend on other applications and the services that they provide. Applications can be started and stopped independently of all others in the system, but -in order to keep running- dependant applications need to be designed to handle the temporary absence of an application that they depend on.

[1] http://learnyousomeerlang.com/building-applications-with-otp

yetihehe 3850 days ago

When you have memory leak in your code, you can typically just drop used resources from console or even make some code to do it periodically. Erlang tracing is so good that finding memory leaks is rather easy. There WAS one situation where VM was leaking data (actually this was improper usage ;)) and it was fixed rather fast after people started to complain.

Grue3 3851 days ago

Seems like a lot of work. In Common Lisp I can just press C-c C-c in SLIME over the changed function and it goes live.