|
So, at one level, 5000 tcp connections is a lot, but at another level, some teams (including mine) are running hundreds of thousands of tcp connections to our clients from our front end Erlang nodes. I've never thought about the dist heartbeats as a scalaing problem. If you have thousands of dist nodes, and your nodes have small memory, dist buffers for each connection to add up -- I think the default is 8mb, you can tune it, but it's a scaling concern. Especially, if you have nodes far apart from each other. Really, the root design of Erlang was for two nodes colocated in a single chassis. That said, it turns out the design scales pretty well to much larger numbers of nodes, and nodes farther apart, but you have to be careful with some things. pg2:join and leave operate under a global lock, which will be slow if you have contention on the lock, or if one of your nodes has some problem where it's still up but very slow. Mnesia doesn't do well with queuing without a lot of help, schema operations under queuing is definitely a bad idea as well. If you want to run Erlang at larger scales, you will need to be ready to poke around in OTP, and ocassionaly in BEAM as well. If you're running big systems, IMHO it makes the most sense for your Erlang nodes to fill your physical nodes, so I don't see much need for containers, but if you do use containers, you need to figure out how to get their names consistent for Erlang, or it's going to be confused. (OTP has a concept of a 'diskless' node which would seem to be a good fit for an ephemeral systems environment, but I must admit I haven't played with that) |
That's essentially what I've had to do in my career as an Erlang engineer. Erlang requires way more massaging and work than the stories people tell about it would lead you to believe.