Out-of-the-box, it's only really designed to scale to a certain point. It's all generally nice, predictable, and with with low latency up until that point. But because the nodes are fully meshed, the TCP heartbeats alone kill performance once you go past that point. So for ex. 100 nodes gives you 5050 TCP connections (100+99+98+97...etc). As parent says, there are methods to deal with this (Riak Core would be an example). But they're non-trivial.
+ you possibly need to bear in mind the design goals: that Erlang is designed to run as a highly reliable, self contained system in _a single geographic location_, with that system possibly left to run on its own for long periods of time (years)
So, at one level, 5000 tcp connections is a lot, but at another level, some teams (including mine) are running hundreds of thousands of tcp connections to our clients from our front end Erlang nodes.
I've never thought about the dist heartbeats as a scalaing problem. If you have thousands of dist nodes, and your nodes have small memory, dist buffers for each connection to add up -- I think the default is 8mb, you can tune it, but it's a scaling concern. Especially, if you have nodes far apart from each other.
Really, the root design of Erlang was for two nodes colocated in a single chassis. That said, it turns out the design scales pretty well to much larger numbers of nodes, and nodes farther apart, but you have to be careful with some things. pg2:join and leave operate under a global lock, which will be slow if you have contention on the lock, or if one of your nodes has some problem where it's still up but very slow. Mnesia doesn't do well with queuing without a lot of help, schema operations under queuing is definitely a bad idea as well.
If you want to run Erlang at larger scales, you will need to be ready to poke around in OTP, and ocassionaly in BEAM as well. If you're running big systems, IMHO it makes the most sense for your Erlang nodes to fill your physical nodes, so I don't see much need for containers, but if you do use containers, you need to figure out how to get their names consistent for Erlang, or it's going to be confused. (OTP has a concept of a 'diskless' node which would seem to be a good fit for an ephemeral systems environment, but I must admit I haven't played with that)
> If you want to run Erlang at larger scales, you will need to be ready to poke around in OTP, and ocassionaly in BEAM as well.
That's essentially what I've had to do in my career as an Erlang engineer. Erlang requires way more massaging and work than the stories people tell about it would lead you to believe.
I don't think it's that much work. It's just that when you hit a wall, you have to fix it yourself. But many of the fixes are easy -- OTP usually does things in a very simple way, and sometimes something more complex is needed to scale beyond.
I think this is the case regardless of what languages or systems you use, but more well used systems may have more experts and more documentation to lean on.
For things that are a good fit for Erlang, it seems worth it to train up a couple people with deep internal knowledge of the VM you're using. As you said in another part of the thread, Erlang doesn't have a lot of abstraction -- most scaling problems aren't too many layers deep.
Yep, I kinda meant it as an illustration - the heartbeats are just the base operation that has to occur between meshed nodes, not that that itself is generally going to be the issue (the inter-node communication is likely to have a bit more going on than just that!).
Containers is where I've had issues, not necessarily anything drastic, but I've found myself dropping half of the the things I really want from an Erlang system (mainly making as much as possible non-stateful rather than stateful, not using supervision trees to their full potential) to buttress against that ephemeral nature of containers (I haven't really looked at diskless nodes in much detail either though)
I can't really go into it in detail, but at a high level Erlang default methods of distribution and security don't scale very well. There are people working on better mechanisms for this, and I know of several companies that have custom solutions for clustering nodes. One big issue is you cannot easily burst Erlang nodes to handle peak traffic. The number of nodes is usually relatively static in a deployment.
Bursting is an interesting thought. I think, if you planned it out, it could be done -- subject to some constraints.
It would be hard to burst stateful (mnesia) nodes --- schema operations require a lock across all the nodes in the schema, and that lock requires that the nodes not be in the middle of the 'log dumping' process (where the global transaction log gets divided into per table logs and such), which means long delays in high volume situations, and even longer delays if doing multiple schema operations. This could probably be patched around, but... In my team's experience, our mnesia nodes were generally ok under higher than normal load, expansion was driven by data size. Expansion could be a lot nicer, but I haven't heard of many database systems that handle expansion off the shelf.
So that leaves stateless nodes. I don't see why you couldn't burst those, especially if using standard dist. Bring up the host, push your software, connect to one dist node, and get meshed automatically, once you see all the pg2 groups you need to operate, enable traffic.
That said, we never did too much of that, we're in bare metal hosting so we don't have an incentive to run different server counts at different times of day, and provisioning isn't fast enough to handle incidental spikes -- we have a pretty good model of what spikes to expect, and provision to handle that load being mindful of the possibility of a load spike during a network or datacenter availability incident.
Our nodes are stateful. We also wrote our own dist (I didn't write it, it could be a lot better). We also have good metrics on our spikes, we just have black friday where we need to burst like crazy.
Oh yeah, black Friday is crazy. For us, we've always been seeing that our annual big spike load ends up being our sustained daily peak in a few months, so we will need those nodes anyway.
+ you possibly need to bear in mind the design goals: that Erlang is designed to run as a highly reliable, self contained system in _a single geographic location_, with that system possibly left to run on its own for long periods of time (years)