Erlang/OTP: Garbage Collector | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	Erlang/OTP: Garbage Collector (medium.com)
	151 points by vkatsuba 1177 days ago

5 comments

dmpk2k 1176 days ago

The post glosses over the most important part of Erlang's GC: it collects process heaps separately. This transforms a hard problem (collecting a global heap with low latency despite concurrent mutators) to a _much_ simpler problem, at the price of more copying. Compare Java's G1 with Erlang's GC; the former hurts my head.

For those problems that are amenable to Erlang's model, this is a fine solution. The only real improvement here would be making collection incremental.

_0w8t 1176 days ago

Erlang also has reference counters for things like strings that are immutable and can be shared between threads (processes in Erlang).

Overall this is a good model. Use GC for small per green thread heaps. Then use reference counters for shared immutable structures that cannot form cycles and copy everything else.

bitwalker 1176 days ago

Erlang only uses reference counting for binaries larger than 64 bytes, everything else is allocated on the process heap (or in heap fragments) and copied. Just that is enough to have a beneficial effect though, since large binaries are relatively common in practice, and are frequently passed around from process-to-process.

derefr 1176 days ago

And if you have any other global state you want to pass around, you can pull off a clever trick by passing it around as a binary and then unpacking it as needed within caller processes.

AFAIK this trick is why BEAM files use an IFF-derived format (easy to parse individual chunks out at runtime), and why erlang:module_info/{1,2} are the way they are: working with module metadata literally just means asking the code-server process for the (shared refcounted) module binary, and then parsing it yourself.

weatherlight 1176 days ago

I thought Erlang's Garbage collector was incremental by virtue of being per process. A system may have tens of thousands of processes, using a gigabyte of memory overall, but if GC occurs in a process with a 20K heap, then the collector only touches that 20K and collection time is imperceptible. With lots of small processes, you can think of this as a truly incremental collector.

It's not incremental per process, but I'm not sure it would even matter that much in practice.

dmpk2k 1176 days ago

Yes, that is how it works, except (as you implicitly note) that large heaps in single processes can cause problems; allowing incremental collection per heap would flatten the latency profile further.

ramchip 1176 days ago

Large GC jobs get scheduled on dirty schedulers today (a background thread pool), since it's not OK to block a normal scheduler more than 1ms or so in Erlang. If they could be split into smaller chunks of work, perhaps it could be done on normal schedulers, making time allocation more fair.

dfox 1176 days ago

Another point is that due to erlang's immutability there cannot be pointers from oldgen into nursery and thus the GC does not need write barriers.

amelius 1176 days ago

Wouldn't Erlang be much more efficient if it simply compiled to the JVM?

_old_dude_ 1176 days ago

Almost 10 years ago, i've tested erjang [1] using a medium sized application. Throughput was better than BEAM but latency was terrible.

[1] https://github.com/trifork/erjang/

pron 1176 days ago

Ten years ago was two whole technological generations ago in the implementation of OpenJDK's GCs. OpenJDK now has a maximum pause time of under 1ms for heaps up to 16TB.

bitwalker 1176 days ago

I really strongly doubt that GC is a bottleneck for Erlang programs on either the BEAM or the JVM - the sophistication of the scheduler, and the way various language primitives interact with it, is where the BEAM is almost certainly gaining an edge over the JVM. That said, I'm sure there are a subset of programs that _would_ be faster on the JVM, just depends on what metrics are being compared.

pron 1175 days ago

> the sophistication of the scheduler, and the way various language primitives interact with it

That was brought over to the JDK six months ago. The JDK can now spawn millions of Erlang-like processes ("virtual threads") per second.

Erlang is a great inspiration and it does incredibly well with the development resources available to it, but it's hard to compete with the level of engineering investment in the JDK and its state-of-the-art GCs, optimising JIT compilers, and low-overhead in-production tracing and profiling.

nickpeterson 1176 days ago

As the other reply noted, I’d be shocked if it wouldn’t be much better now, seeing something like graal being used would be really interesting. I think if Elixir could target beam or jvm it would be an amazing language for many tasks.

throwawaymaths 1176 days ago

> it would be an amazing language

No, it wouldn't. Elixir is getting really fast computation through, e.g. nx, and the user story is incredible (OS install to stable diffusion in 40 minutes, most of which is dicking around figuring out how to install CUDA). Is it easy to run stable diffusion on jvm?

nickpeterson 1175 days ago

I think you might be taking my comment as implying the jvm is better than beam but that isn’t the argument I’m making. Having a strong jvm option means you can cut through a ton of corporate red tape. I don’t need to convince some skeptical CTO if the jvm is reasonable. It’s makes choosing elixir for a project feel about as hard a change as using clojure or groovy.

lenkite 1176 days ago

JVM standard does not support isolates so it won't work. Java's father Gosling wanted to get isolation into the Java spec but he failed.

The modern GraalVM does have isolates but its a VM specific feature and not a java standard feature.

jlouis 1176 days ago

It likely would. But efficiency is only one factor. Many Erlang applications are far more concerned with consistent latency than throughput efficiency. So a switch to the JVM is a lot of cost.

amelius 1176 days ago

Are these folks also running their software on a real-time OS?

weatherlight 1175 days ago

Erlang can be run bare metal.

vkatsuba 1175 days ago

You can take a look to the interview with Francesco Cesarini https://www.youtube.com/watch?v=-m31ag9z4VY for more details - here is provided a part where compared JVM with a BEAM.

weatherlight 1175 days ago

Sure on a single machine, perhaps. but once you have multiple machines, the JVM would have to do what the BEAM does today; copy messages between processes regardless of location. That's going to slow down throughput.

nesarkvechnep 1176 days ago

No.

vcryan 1176 days ago

Ha! Absolutely no

amelius 1176 days ago

Why not? JVM has a highly optimized concurrent GC.

toast0 1176 days ago

Erlang has a highly optimized concurrent GC as well. It's just optimized for different things. And maybe the concurrency of the GC is different; Erlang has one heap per process (aka green thread), and no concurrency within a heap.

Erlang GC is also very simple and easy to understand because language features only allow references in one direction. Much of JVM GC complexity would be wasted as there's no need to look for reference loops and such, since they're not possible.

omginternets 1176 days ago

It’s a GC tuned for imperative languages that prefer mutation over allocation, which is the exact inverse of what BEAM needs.

pron 1176 days ago

OpenJDK's GCs do have elaborate mechanisms to support mutation, but if they're unused they impose no extra overhead.

vkatsuba 1176 days ago

This is a good point, thanks! I will extend the topic or maybe will be better to provide new topic as continuation of the current topic - since putting everything in one article can be difficult to understand and will increase the article itself, making it more difficult to read.

throwawaymaths 1176 days ago

> the price of more copying.

More copying if you pass values between processes. Honestly it would be really cool if you could mark off certain values that you know you're going to pass around and put them in a heap like the global binary heap.

benmmurphy 1176 days ago

there are lots of foot guns for the user with this model. because transferring data between processes involves copying this can become a problem. Erlang tries to optimise the handling of large binaries by using a separate reference counted heap. however, this introduces another set of issues where memory is 'leaked' because a smaller binary is holding a reference to a larger binary or because processes that have not been GC'd have not decremented the ref count of large binaries in the heap that they no longer user.

throwawaymaths 1176 days ago

You literally listed the two biggest footguns and claimed there are "lots" of footguns. That really is it.

travisgriggs 1176 days ago

Scaling up an MQTT<->webhook relay that I wrote in Elixir to 1000’s of long running connections, I found that I needed to manually trigger periodic GCs on my long lived processes.

As binary strings work their way through the pipelines via messages, it leaves binaries on the binary heap that don’t go away because the ref count stays above 1. There are a number of GC parameters one can tune on a per process level that might cause a long lived process to collect more aggressively. But my long lived processes have a natural “ratchet” point where it was just easy to throw a collect in. This solved all of my slow growth memory problems.

I’ve read elsewhere that Erlangs GC benefits often on the basis that must Erlanger processes are short lived.

toast0 1176 days ago

There was some work to try to make this use case work with normal GC (RefC binaries count as their size garbage, rather than just the size the reference is on the process heap). But if you know your process should be pretty clean at some point, manually triggering GC will do a better job. Off heap message passing might help this case too.

Nezteb 1176 days ago

Is any part of that relay open-source by chance? If not, what libraries are you using?

travisgriggs 1176 days ago

I’m not a big fan of framework mashups, so I’ve kept it pretty light and straightforward. Also, I was learning (and of course still am) when I started putting this together, so less things to learn was a boon.

- tortoise311 - I’ve toyed rewriting my own. We do very simple MQTT, 0 QoS, no wills, etc. the existing implementation creates many long lived procs per connection and we keep our connections live; they’re mostly subscribers

- bandit/plug - originally I was doing Phoenix because That’s The Thing, but it was such “A Way”, I was constantly having to learn how to accommodate things I just ended up turning off or suppressing. I just have straightforward (imo) API endpoints; Mat Trudel suggested I might just use Bandit with Plug. He’s done a great job with Bandit and been very proactive; just doing Plug myself helped me understand the whole HTTP handling pipeline at a more fundamental level

- CacheX - we use credentials oauth workflow. We were able to implement that in a single plug and use cachex. I may throw that out eventually. I’ve heard people indicate cachex has hung on them and it’s easy enough to do your own here

- Mint - I tried Finch and a couple other “help you” request frameworks. I had all kinds of problems tuning them as I moved up to many thousands of steady stream (every 10s+) hooks being dispatched. Eventually, I saw a comment in one of them that said something like “at any scale, you end up doing your own layer on top of mint to best fit the nuances of your application”, so I did just that, using the source from peppermint and finch to guide/inspire me

- openapispex -to swaggerify our endpoints; this requires a lot of boilerplate code and forced me to learn to write some of my own macros just to reduce it a little; I understand you get some of that for free when using it with Phoenix; the authors have been really helpful

- recon - because

There’s probably some stuff I should use that I’m not. But I’ve got a limited amount of time to improve this and keep native apps on two platforms running.

If I blogged, it’d be a good write up (how to do a kind of web thing — but without pages — without Phoenix!) maybe.

tylerpachal 1175 days ago

With CacheX being involved the memory leak you mention in your parent comment might be caused by ETS holding references to your binaries. I ran into something similar a few years ago and wrote about it here[0], but the tldr is that you can use `:binary.copy/1` or use the `:copy` option in the `Jason` module if you are using it.

[0] https://tylerpachal.medium.com/tracking-down-an-ets-related-...

Nezteb 1175 days ago

I appreciate the list, thanks! `tortoise311` was the only one I hadn't previously heard of.

sacnoradhq 1176 days ago

ORCA (as part of the Pony compiled language) includes a more performant GC than C4 or BEAM/HiPE. It does so by reducing almost to zero the need to do global GC pauses by sharding the heap per actor, zero-copy message passing, fine-grained concurrent sharing semantics, and lock-free data structures.

bitwalker 1176 days ago

I mean, the BEAM doesn't have global GC pauses either, as each process has its own heap - but I would expect Pony can take things a step further as a result of its strong type system, which IIRC is why it can support zero-copy messaging.

sacnoradhq 1176 days ago

This is true. Erlang's heap per PID. Azul's C4 and other JVM GC move in the no world stopping direction but they're still at the mercy of the model of the JVM.

If one can avoid GCs altogether a-la precise (de)allocations like Rust's non-reference-counted entities, this is cool but often requires unnatural contortionism. RC is still necessary in certain cases.

isaacsanders 1176 days ago

This is another article with more details: https://hamidreza-s.github.io/erlang%20garbage%20collection%...

vkatsuba 1176 days ago

If you want to expand the examples or improve the topic - just leave a comment about it.