Not the original asker, but it would be nice to be able to upgrade between OTP releases, without having to restart my application (but I have no expectation of that being possible ever when there are VM changes, and unlikely for the effort to be spent to do it for the user-space erlang bits either). I have to use DNS for load balancing [1] and big mnesia tables, so I have to wait a long time for traffic to drain, and then another long time for the application to start back up.
Working for 4 years in an Erlang environment where hotloading is the norm, makes me wish for it everywhere! Why do I have to reboot to fix kernel bugs in tcp? :(
[1] the load balancers I have access to where we host had more downtime than our hosts, so not actually helpful
Yes, you can reload all the beam files (even if you have to unsticky them -- you can do that), but that doesn't mean r16 -> r17 didn't make changes you can't safely load. Some parts of OTP have state upgraders, and some don't; some may depend on bif/nif changes as well.
>> I have to use DNS for load balancing [because the load balancers fall over often].
>Oh lord. That's a terrible situation to be in.
It's not terrible, it's just not great. Our hosting environment is generally very reliable, so if I don't screw things up, my systems won't fall over. It's just their loadbalancers are crap, the suggested upgrade path was run a load balancer in a VM appliance; which seems like maybe I should just run CARP myself on the hosts (or something), instead, and skip a layer, but I'll probably never get around to that, because it doesn't come up that often :)
> ...but that doesn't mean r16 -> r17 didn't make changes you can't safely load.
Oh yeah. I know. I would expect for some-to-many minor things (like the TCP bug you mentioned), this wouldn't be an insane way to do a hot upgrade. Guess one would need to make a close study of the diffs.
Regardless, I mentioned the fact that this isn't a good idea for any novice who might stumble across this comment months or years hence and get the wrong idea. :)
> ...maybe I should just run CARP myself on the hosts...
Oh, CARP is so cool. It's a bit of a pity that ucarp doesn't support IPv6. But, how does CARP replace a load balancer? Isn't it used for single host availability, or is my understanding too narrow?
Oh, I meant a os kernel bug; no way to hotload those.
I have two servers, if I get two carpable ips, and may each server primary on one, and put them both in DNS, I have load balancing and failover. If I just use one IP, at least I have failover, anyway I need each server to be able to handle full load, so I could have hot-warm instead of hot-hot.
(with more than two servers, need to figure something else out, probably two boxes running a simple load balancing in front of the rest of the cluster.
I would assume the longer any VM is running the higher the chances of a service degrading. I guess this is mostly due to memory leaks or bit rot. I have no Erlang VM experience, so my comment was geared towards VMs in general.
> I guess this is mostly due to memory leaks or bit rot.
"Bit rot"? The only defense for the bit rot I'm aware of is ECC RAM.
Anyway. AFAIK (and I'm no Erlang expert, so there's probably something pertinent that I don't know) unless there's a resource leak in core Erlang code, resource leaks can be fixed by restarting the leaking application, or killing the leaking process. [0]
[0] Erlang software is often broken up into Applications. [1] An application is a collection of code with a well-known entry point that (ideally) does a particular thing. An application can depend on other applications and the services that they provide. Applications can be started and stopped independently of all others in the system, but -in order to keep running- dependant applications need to be designed to handle the temporary absence of an application that they depend on.
When you have memory leak in your code, you can typically just drop used resources from console or even make some code to do it periodically. Erlang tracing is so good that finding memory leaks is rather easy. There WAS one situation where VM was leaking data (actually this was improper usage ;)) and it was fixed rather fast after people started to complain.
Working for 4 years in an Erlang environment where hotloading is the norm, makes me wish for it everywhere! Why do I have to reboot to fix kernel bugs in tcp? :(
[1] the load balancers I have access to where we host had more downtime than our hosts, so not actually helpful