I wonder what HN's devops people think about this wrt the current trend of containers and immutable infrastructure. Hot code reloading seems to be directly at odds with the idea of immutable architecture, because essentially the application code becomes state. So your container becomes stateful, instead of swapping out your old appserver container by a new one.
What's your opinion? Ditch Docker and put the Erlang VM on the host OS? Ditch hot code loading and swap containers the usual way? Some middle ground?
Hot code reloading is always more work than just blue-green and should be avoided if necessary. For example, author of Learn You Some Erlang writes [1]:
> if you can avoid the whole procedure (which will be called relup from now on) and do simple rolling upgrades by restarting VMs and booting new applications, I would recommend you do so.
Erlang grew out of the challenges faced by telecoms industries such as what do you do when blue-green isn't an option? Think an in-use packet switch that is the only point of contact between two networks. No way to take the switch down for maintenance without some interruption in service, which gets messy when dealing with timeouts. In the Armstrong thesis paper he gives another example [2]:
> Usually in a sequential system, if we wish to change the code, we stop the system, change the code and re-start the program. In certain real-time control systems, we might never be able to turn off the system in order to change the code and so these systems have to be designed so that the code can be changed without stopping the system. An example of such a system is the X2000 satellite control system developed by NASA.
This power comes at a cost, though. LYSE again:
> It is said that divisions of Ericsson that do use relups spend as much time testing them as they do testing their applications themselves. They are a tool to be used when working with products that can imperatively never be shut down.
The point being, hot code reloading is an additional feature that can come in handy but for most of HN's audience probably won't be relevant; the cost outweighs the benefits of just blue-green deploying it.
On the contrary, the implementation of Erlang's hot code reloading forces state to be separated from the code. If you look at Erlang's gen_server, every call requires you to return a new State object, which is passed to the next function call.
In other words, you can compare the Erlang's virtual machine with a container itself, and everything old is new again!
Hot code reloading is best used for a scenario where you cannot afford to restart your system, usually because it drags a lot of internal state around and reconstructing that state is expensive.
Typical use cases include several gigabytes of in memory state which takes a long time to read in and get hot when redeploying or a large amount of long-running TCP connections.
For most other uses, we just do rolling upgrades in Erlang as everyone else is doing. It is somewhat simpler to get to work, and immutable architecture is to a certain extent easier to manipulate.
You have to start with the problem. If your problem is solved by highly available and stateful services, then the Erlang VM on the host seems like a good idea. If the availability doesn't matter that much or the services can be made stateless without much pain - go for the containers.
We're using Erlang as the primary language environment for our IoT product for a lot of reasons but one big one is: Hot code loading and a very robust release upgrade environment with a lot of control over the process (including restarting everything inside the VM if that's what we wish to do).
For our product, a digital light switch / dimmer, high uptime guarantees is a very important requirement and Erlang has it all plus many other wonderful features.
Attended a talk by one of the creators of Erlang a couple of weeks ago. Very passionate about achieving maximum uptime for applications written in his language. This is one of the features that makes that possible... Fascinating stuff.
Interesting. The hot code reloading functionality in Erlang led me to investigate ruby (my preferred Dev language) a bit more.
You can do a hot code load in Ruby using the Kernel#load() call. It won't alter functionality currently on the call stack, but it will change the functionality of everything not on the call stack. With some sympathetic design, you can achieve hot code loading fo high availability in ruby.
This is cool. What type of scenarios could you not afford a few seconds of downtime on a server? For example, why not simply remove a machine from the cluster/nlb and upgrade it, then add it back ...?
When your server has several gigs of state. It's VERY useful on a dev server. Instead of waiting several minutes for reload, I just load in new code manually (typically I change only 1-2 files per reload). If something breaks - hey, it's only dev server. Erlangs other feature - almost everything works alone - helps with this. If something breaks, it breaks only in one place, so most of the time I only need to make small changes and reload once more. Rest of the system does what it needs without any downgrades.
> Instead of waiting several minutes for reload, I just load in new code manually (typically I change only 1-2 files per reload). If something breaks - hey, it's only dev server.
Someone wrote a module for elixir that uses inotify (and similar) to -I think- watch .beam files for modification and perform the required hot-reloads automatically.
I would be reluctant to run this in production, and I can see situations (even in development) where this could trigger unwanted code purging and would be disastrous, but it's a pretty neat thing to have and -it seems- a must for Web Dev people.
This would be terrible in production -- often there's an order you need to load the beam files, and I wouldn't want to add that to the compile step, it's very simple to load in the proper order. You could pretty easily use code:soft_purge/1 prior to loading to avoid killing lingering processes though, and then it would probably be reasonable for development.
Yeah it could be. Frankly, I'd likely reach for Erlang Releases before I reached for this when updating software in production.
However, for a large variety of dev work, this automatic module reloading thingie works pretty well. :)
> You could pretty easily use code:soft_purge/1 prior to loading to avoid killing lingering processes though...
Mmm. Okay. So, I'm not 100% on how this works, so please bear with me and my inaccurate terminology. :(
In any given Erlang system, there can be two versions of a module running, the "current" one, and the "old" one, right?
So, if you call code:soft_purge/1 when there is no "old" code loaded, it should return true, yes? (In addition to returning true when there's no process running the "old" code.) [0]
So, would this be a way to write an auto-loader that doesn't purge in-use code?
* code:soft_purge(?MODULE)
* if false, wait a while then retry
* if true, code:load_file(?MODULE)
I guess maybe you'd want to build up a list of all the modules that have been modified, and wait until code:soft_purge/1 returns true for all of them before loading the modules. (maybe.)
You also -obviously- want an override that allows for the purging of in-use code.
[0] Testing indicates that it does, but it's often good to double-check. :)
Yes, you've got the concepts and implementation correct.
The exact strategy for reloading (wait for all at once, load whatever is ready, how long to wait, etc), left as an exercise for the reader. For dev, I use a function in the shell that loads everything that changed (no soft purge), for prod, i have a function that goes in order and checks soft purge, then loads (if the 2nd module doesn't soft purge, it will have already loaded the first module, but it will stop before trying the 3rd).
With most things in gen_server's, there's not a lot of opportunity for lingering code, but sometimes it happens.
Not the original asker, but it would be nice to be able to upgrade between OTP releases, without having to restart my application (but I have no expectation of that being possible ever when there are VM changes, and unlikely for the effort to be spent to do it for the user-space erlang bits either). I have to use DNS for load balancing [1] and big mnesia tables, so I have to wait a long time for traffic to drain, and then another long time for the application to start back up.
Working for 4 years in an Erlang environment where hotloading is the norm, makes me wish for it everywhere! Why do I have to reboot to fix kernel bugs in tcp? :(
[1] the load balancers I have access to where we host had more downtime than our hosts, so not actually helpful
Yes, you can reload all the beam files (even if you have to unsticky them -- you can do that), but that doesn't mean r16 -> r17 didn't make changes you can't safely load. Some parts of OTP have state upgraders, and some don't; some may depend on bif/nif changes as well.
>> I have to use DNS for load balancing [because the load balancers fall over often].
>Oh lord. That's a terrible situation to be in.
It's not terrible, it's just not great. Our hosting environment is generally very reliable, so if I don't screw things up, my systems won't fall over. It's just their loadbalancers are crap, the suggested upgrade path was run a load balancer in a VM appliance; which seems like maybe I should just run CARP myself on the hosts (or something), instead, and skip a layer, but I'll probably never get around to that, because it doesn't come up that often :)
> ...but that doesn't mean r16 -> r17 didn't make changes you can't safely load.
Oh yeah. I know. I would expect for some-to-many minor things (like the TCP bug you mentioned), this wouldn't be an insane way to do a hot upgrade. Guess one would need to make a close study of the diffs.
Regardless, I mentioned the fact that this isn't a good idea for any novice who might stumble across this comment months or years hence and get the wrong idea. :)
> ...maybe I should just run CARP myself on the hosts...
Oh, CARP is so cool. It's a bit of a pity that ucarp doesn't support IPv6. But, how does CARP replace a load balancer? Isn't it used for single host availability, or is my understanding too narrow?
I would assume the longer any VM is running the higher the chances of a service degrading. I guess this is mostly due to memory leaks or bit rot. I have no Erlang VM experience, so my comment was geared towards VMs in general.
> I guess this is mostly due to memory leaks or bit rot.
"Bit rot"? The only defense for the bit rot I'm aware of is ECC RAM.
Anyway. AFAIK (and I'm no Erlang expert, so there's probably something pertinent that I don't know) unless there's a resource leak in core Erlang code, resource leaks can be fixed by restarting the leaking application, or killing the leaking process. [0]
[0] Erlang software is often broken up into Applications. [1] An application is a collection of code with a well-known entry point that (ideally) does a particular thing. An application can depend on other applications and the services that they provide. Applications can be started and stopped independently of all others in the system, but -in order to keep running- dependant applications need to be designed to handle the temporary absence of an application that they depend on.
When you have memory leak in your code, you can typically just drop used resources from console or even make some code to do it periodically. Erlang tracing is so good that finding memory leaks is rather easy. There WAS one situation where VM was leaking data (actually this was improper usage ;)) and it was fixed rather fast after people started to complain.
What's your opinion? Ditch Docker and put the Erlang VM on the host OS? Ditch hot code loading and swap containers the usual way? Some middle ground?