The post glosses over the most important part of Erlang's GC: it collects process heaps separately. This transforms a hard problem (collecting a global heap with low latency despite concurrent mutators) to a _much_ simpler problem, at the price of more copying. Compare Java's G1 with Erlang's GC; the former hurts my head.
For those problems that are amenable to Erlang's model, this is a fine solution. The only real improvement here would be making collection incremental.
Erlang also has reference counters for things like strings that are immutable and can be shared between threads (processes in Erlang).
Overall this is a good model. Use GC for small per green thread heaps. Then use reference counters for shared immutable structures that cannot form cycles and copy everything else.
Erlang only uses reference counting for binaries larger than 64 bytes, everything else is allocated on the process heap (or in heap fragments) and copied. Just that is enough to have a beneficial effect though, since large binaries are relatively common in practice, and are frequently passed around from process-to-process.
And if you have any other global state you want to pass around, you can pull off a clever trick by passing it around as a binary and then unpacking it as needed within caller processes.
AFAIK this trick is why BEAM files use an IFF-derived format (easy to parse individual chunks out at runtime), and why erlang:module_info/{1,2} are the way they are: working with module metadata literally just means asking the code-server process for the (shared refcounted) module binary, and then parsing it yourself.
I thought Erlang's Garbage collector was incremental by virtue of being per process.
A system may have tens of thousands of processes, using a gigabyte of memory overall, but if GC occurs in a process with a 20K heap, then the collector only touches that 20K and collection time is imperceptible. With lots of small processes, you can think of this as a truly incremental collector.
It's not incremental per process, but I'm not sure it would even matter that much in practice.
Yes, that is how it works, except (as you implicitly note) that large heaps in single processes can cause problems; allowing incremental collection per heap would flatten the latency profile further.
Large GC jobs get scheduled on dirty schedulers today (a background thread pool), since it's not OK to block a normal scheduler more than 1ms or so in Erlang. If they could be split into smaller chunks of work, perhaps it could be done on normal schedulers, making time allocation more fair.
Ten years ago was two whole technological generations ago in the implementation of OpenJDK's GCs. OpenJDK now has a maximum pause time of under 1ms for heaps up to 16TB.
I really strongly doubt that GC is a bottleneck for Erlang programs on either the BEAM or the JVM - the sophistication of the scheduler, and the way various language primitives interact with it, is where the BEAM is almost certainly gaining an edge over the JVM. That said, I'm sure there are a subset of programs that _would_ be faster on the JVM, just depends on what metrics are being compared.
> the sophistication of the scheduler, and the way various language primitives interact with it
That was brought over to the JDK six months ago. The JDK can now spawn millions of Erlang-like processes ("virtual threads") per second.
Erlang is a great inspiration and it does incredibly well with the development resources available to it, but it's hard to compete with the level of engineering investment in the JDK and its state-of-the-art GCs, optimising JIT compilers, and low-overhead in-production tracing and profiling.
As the other reply noted, I’d be shocked if it wouldn’t be much better now, seeing something like graal being used would be really interesting. I think if Elixir could target beam or jvm it would be an amazing language for many tasks.
No, it wouldn't. Elixir is getting really fast computation through, e.g. nx, and the user story is incredible (OS install to stable diffusion in 40 minutes, most of which is dicking around figuring out how to install CUDA). Is it easy to run stable diffusion on jvm?
I think you might be taking my comment as implying the jvm is better than beam but that isn’t the argument I’m making. Having a strong jvm option means you can cut through a ton of corporate red tape. I don’t need to convince some skeptical CTO if the jvm is reasonable. It’s makes choosing elixir for a project feel about as hard a change as using clojure or groovy.
It likely would. But efficiency is only one factor. Many Erlang applications are far more concerned with consistent latency than throughput efficiency. So a switch to the JVM is a lot of cost.
You can take a look to the interview with Francesco Cesarini https://www.youtube.com/watch?v=-m31ag9z4VY for more details - here is provided a part where compared JVM with a BEAM.
Sure on a single machine, perhaps. but once you have multiple machines, the JVM would have to do what the BEAM does today; copy messages between processes regardless of location. That's going to slow down throughput.
Erlang has a highly optimized concurrent GC as well. It's just optimized for different things. And maybe the concurrency of the GC is different; Erlang has one heap per process (aka green thread), and no concurrency within a heap.
Erlang GC is also very simple and easy to understand because language features only allow references in one direction. Much of JVM GC complexity would be wasted as there's no need to look for reference loops and such, since they're not possible.
This is a good point, thanks! I will extend the topic or maybe will be better to provide new topic as continuation of the current topic - since putting everything in one article can be difficult to understand and will increase the article itself, making it more difficult to read.
More copying if you pass values between processes. Honestly it would be really cool if you could mark off certain values that you know you're going to pass around and put them in a heap like the global binary heap.
there are lots of foot guns for the user with this model. because transferring data between processes involves copying this can become a problem. Erlang tries to optimise the handling of large binaries by using a separate reference counted heap. however, this introduces another set of issues where memory is 'leaked' because a smaller binary is holding a reference to a larger binary or because processes that have not been GC'd have not decremented the ref count of large binaries in the heap that they no longer user.
Scaling up an MQTT<->webhook relay that I wrote in Elixir to 1000’s of long running connections, I found that I needed to manually trigger periodic GCs on my long lived processes.
As binary strings work their way through the pipelines via messages, it leaves binaries on the binary heap that don’t go away because the ref count stays above 1. There are a number of GC parameters one can tune on a per process level that might cause a long lived process to collect more aggressively. But my long lived processes have a natural “ratchet” point where it was just easy to throw a collect in. This solved all of my slow growth memory problems.
I’ve read elsewhere that Erlangs GC benefits often on the basis that must Erlanger processes are short lived.
There was some work to try to make this use case work with normal GC (RefC binaries count as their size garbage, rather than just the size the reference is on the process heap). But if you know your process should be pretty clean at some point, manually triggering GC will do a better job. Off heap message passing might help this case too.
I’m not a big fan of framework mashups, so I’ve kept it pretty light and straightforward. Also, I was learning (and of course still am) when I started putting this together, so less things to learn was a boon.
- tortoise311 - I’ve toyed rewriting my own. We do very simple MQTT, 0 QoS, no wills, etc. the existing implementation creates many long lived procs per connection and we keep our connections live; they’re mostly subscribers
- bandit/plug - originally I was doing Phoenix because That’s The Thing, but it was such “A Way”, I was constantly having to learn how to accommodate things I just ended up turning off or suppressing. I just have straightforward (imo) API endpoints; Mat Trudel suggested I might just use Bandit with Plug. He’s done a great job with Bandit and been very proactive; just doing Plug myself helped me understand the whole HTTP handling pipeline at a more fundamental level
- CacheX - we use credentials oauth workflow. We were able to implement that in a single plug and use cachex. I may throw that out eventually. I’ve heard people indicate cachex has hung on them and it’s easy enough to do your own here
- Mint - I tried Finch and a couple other “help you” request frameworks. I had all kinds of problems tuning them as I moved up to many thousands of steady stream (every 10s+) hooks being dispatched. Eventually, I saw a comment in one of them that said something like “at any scale, you end up doing your own layer on top of mint to best fit the nuances of your application”, so I did just that, using the source from peppermint and finch to guide/inspire me
- openapispex -to swaggerify our endpoints; this requires a lot of boilerplate code and forced me to learn to write some of my own macros just to reduce it a little; I understand you get some of that for free when using it with Phoenix; the authors have been really helpful
- recon - because
There’s probably some stuff I should use that I’m not. But I’ve got a limited amount of time to improve this and keep native apps on two platforms running.
If I blogged, it’d be a good write up (how to do a kind of web thing — but without pages — without Phoenix!) maybe.
With CacheX being involved the memory leak you mention in your parent comment might be caused by ETS holding references to your binaries. I ran into something similar a few years ago and wrote about it here[0], but the tldr is that you can use `:binary.copy/1` or use the `:copy` option in the `Jason` module if you are using it.
ORCA (as part of the Pony compiled language) includes a more performant GC than C4 or BEAM/HiPE. It does so by reducing almost to zero the need to do global GC pauses by sharding the heap per actor, zero-copy message passing, fine-grained concurrent sharing semantics, and lock-free data structures.
I mean, the BEAM doesn't have global GC pauses either, as each process has its own heap - but I would expect Pony can take things a step further as a result of its strong type system, which IIRC is why it can support zero-copy messaging.
This is true. Erlang's heap per PID. Azul's C4 and other JVM GC move in the no world stopping direction but they're still at the mercy of the model of the JVM.
If one can avoid GCs altogether a-la precise (de)allocations like Rust's non-reference-counted entities, this is cool but often requires unnatural contortionism. RC is still necessary in certain cases.
For those problems that are amenable to Erlang's model, this is a fine solution. The only real improvement here would be making collection incremental.