Hacker News new | ask | show | jobs
by RobertKerans 2800 days ago
Out-of-the-box, it's only really designed to scale to a certain point. It's all generally nice, predictable, and with with low latency up until that point. But because the nodes are fully meshed, the TCP heartbeats alone kill performance once you go past that point. So for ex. 100 nodes gives you 5050 TCP connections (100+99+98+97...etc). As parent says, there are methods to deal with this (Riak Core would be an example). But they're non-trivial.

+ you possibly need to bear in mind the design goals: that Erlang is designed to run as a highly reliable, self contained system in _a single geographic location_, with that system possibly left to run on its own for long periods of time (years)

1 comments

So, at one level, 5000 tcp connections is a lot, but at another level, some teams (including mine) are running hundreds of thousands of tcp connections to our clients from our front end Erlang nodes.

I've never thought about the dist heartbeats as a scalaing problem. If you have thousands of dist nodes, and your nodes have small memory, dist buffers for each connection to add up -- I think the default is 8mb, you can tune it, but it's a scaling concern. Especially, if you have nodes far apart from each other.

Really, the root design of Erlang was for two nodes colocated in a single chassis. That said, it turns out the design scales pretty well to much larger numbers of nodes, and nodes farther apart, but you have to be careful with some things. pg2:join and leave operate under a global lock, which will be slow if you have contention on the lock, or if one of your nodes has some problem where it's still up but very slow. Mnesia doesn't do well with queuing without a lot of help, schema operations under queuing is definitely a bad idea as well.

If you want to run Erlang at larger scales, you will need to be ready to poke around in OTP, and ocassionaly in BEAM as well. If you're running big systems, IMHO it makes the most sense for your Erlang nodes to fill your physical nodes, so I don't see much need for containers, but if you do use containers, you need to figure out how to get their names consistent for Erlang, or it's going to be confused. (OTP has a concept of a 'diskless' node which would seem to be a good fit for an ephemeral systems environment, but I must admit I haven't played with that)

> If you want to run Erlang at larger scales, you will need to be ready to poke around in OTP, and ocassionaly in BEAM as well.

That's essentially what I've had to do in my career as an Erlang engineer. Erlang requires way more massaging and work than the stories people tell about it would lead you to believe.

I don't think it's that much work. It's just that when you hit a wall, you have to fix it yourself. But many of the fixes are easy -- OTP usually does things in a very simple way, and sometimes something more complex is needed to scale beyond.

I think this is the case regardless of what languages or systems you use, but more well used systems may have more experts and more documentation to lean on.

For things that are a good fit for Erlang, it seems worth it to train up a couple people with deep internal knowledge of the VM you're using. As you said in another part of the thread, Erlang doesn't have a lot of abstraction -- most scaling problems aren't too many layers deep.

Yep, I kinda meant it as an illustration - the heartbeats are just the base operation that has to occur between meshed nodes, not that that itself is generally going to be the issue (the inter-node communication is likely to have a bit more going on than just that!).

Containers is where I've had issues, not necessarily anything drastic, but I've found myself dropping half of the the things I really want from an Erlang system (mainly making as much as possible non-stateful rather than stateful, not using supervision trees to their full potential) to buttress against that ephemeral nature of containers (I haven't really looked at diskless nodes in much detail either though)