Hacker News new | ask | show | jobs
by toast0 3265 days ago
Erlang's core reason for existence is to control telephone switches, which had two independent general purpose computers connected to the physical switch. So reliability, redundancy, recovery, and fault isolation were the core needs; that drove the design for isolated processes with message passing between them. Because Erlang was in the control plane, and only managing the signal path, not passing the signals itself, there wasn't a big need for speed, as long as it wasn't too slow

Fast forward several years, and isolated processes turns out to be a great fit for large SMP systems, and Erlang/beam is now doing signal path work in a lot of places. Erlang tends not to put explicit limits, but some techniques are going to fail at large scale; ex: if you have 50,000 processes across many nodes, sending the same message to each of those processes is going to be slow; sending one message to each node and fanning out from there is going to be faster; in no small part because you've reduced the network bandwidth you're using.

The nice thing when you hit Erlang scaling limits is that almost everything you need to fix is going to be in a pretty simple state. You're not going to find many things that are layers of optimizations on top of hacks on top of optimizations --- they do a good job of keeping things simple, and not optimizing until it's needed (and even then, they usually pick simple optimizations). Keeping things simple goes a really long way (especially with today's enormous servers).

Edited to add: I don't think they've even needed to tweak the vm yet either, just their user space code. That's pretty huge too.

2 comments

That, indeed. When I compare Elixir/Erlang to some other systems I worked on, "shallow" is the word that pops up. You hit a limitation, you dig into some source code, and you find out that it's pretty simple to understand and to fix it. It feels manageable, I've yet have to run into frustrating roadblocks, and that all gives me the confidence that when I do need to scale up, I have a system I will understand and will be able to adapt. It looks like Discord's story confirms that.
It sounds like the main benefit to Elixir is that message handling is built into the language. How does that compare to using a message queue service like zeromq?
As macintux, said they don't really compare. Messaging is everywhere in Erlang, in a way that nobody would do with a message queue. For example, you don't read or write to a tcp socket; you receive and send messages to a 'port'. The same is true for file i/o. Rather than calling a method on a shared object, you generally would send a message to a process that owns the state (or a process that manages the state in a database).

Sending messages to processes on other nodes has the same syntax as sending to a process on your node, which makes it easy to run a distributed system. (Ports are different, you'd have to setup a proxy process on the remote node in order to send/receive from that).

Of course, with the base of process to process messaging you can build a higher level messaging queue (see RabbitMQ for a popular message queue built in erlang).

Messaging is implicit in everything Erlang & Elixir do. Bolting on a message queue to software written in another language isn't really comparable (not a value judgement, it's just not really useful to compare them).
>I don't think they've even needed to tweak the vm yet either, just their user space code.

We haven't really had to. Really only args we use are "+sbt db +zdbbl 32000 +K true" and increasing the default process limit.