Hacker News new | ask | show | jobs
by Skinney 2233 days ago
Java's GC has to be best in class because of shared memory. In a shared nothing world doing GC in one "thread" doesn't stop other threads from executing, it also means that each heap can be very small so you might not even need to perform gc before the thread is done executing. It's truly amazing what Java is doing, but keep in mind that Erlang has worked this way for _decades_. And still, a classic web server that spins up one thread/process per request, can still potentially end up responding to the request with zero garbage collection in the best case, irrespective of load. This will not be true for Shenendoa or ZGC.

Does Java's Hot code reloading support data migration? One benefit of Erlangs model is that you can execute hooks when HCR is performed to make sure your data in memory is migrated to a new format.

But really, the most important thing about Erlangs actor model is error handling. If I spin up a process in Erlang and it fails, it won't corrupt the state of my other processes. In Java this can only be attained through disipline since all memory is shared. Also, I can very easily specify which processes should work together as units, such that if one fails, they all fail, and can be restarted together from a known working state. This, again, requires discipline in Java.

1 comments

Per thread GC is definitely a different approach than Java takes. The trade-off is that shared memory between Java threads is nearly free. Basically the same approach C++ uses, except Java has better concurrency primitives because its VM. Not sure about Erlang but data sharing between processes on JS and Python is very expensive and a frequent criticism of those languages. You can achieve zero garbage per request in Java. Typically high performance web frameworks like Undertow and Vert.X are designed this way. User code rarely does it but its definitely possible.

Not sure what you mean by data migration on code reloading. I suspect the mechanisms are different enough that it can't be compared. With Java you can load arbitrary new code, but changes to existing code are limited in ways that prevent data incompatibilities. For example you can add fields to existing object but you can't change the type of existing ones.

Data corruption from threading is rare in Java. I can't remember the last time I ran into it. Its easy to do but everyone is used to threads and the concurrency implementation is one of the best I've used. Java also supports thread groups to ensure that threads die and get restarted together. Its not automatic, you need to manage the groups, but I think it achieves the same.

In Erlang processes need to send messages to each other. And those messages are copies (nothing is shared). This is less efficient than in Java where everything is shared, but it also means that process a cannot change something that process b is looking at. So locks in Erlang, aren't necessary. It also enables easy distribution. When all processes share data by messaging, it doesn't matter if those processes are running on the same machine or are distributed on a network.

Since Erlang has one GC per process, you can create garbage in one process without triggering GC if that process is short lived. Once the process dies, the entire heap for that process is returned to memory. So in Java, you'd have to write code in a special way to avoid GC, but in Erlang that happens automatically if either your process exits before the heap for that process needs GC. And in Erlang it's pretty normal to run one process per http request, so this does happen in practice, without requiring anything of the programmer.

When it comes to hot code reloading and data migration. When you hot load code into an Erlang vm, a hook will be called if defined which allows you to migrate all data that is in memory into new format. So, you're not restricted by data-incompatibility.

Your last paragraph is what I referred to by required discipline. Everyone that touches the code is required to understand what causes corruption and what doesn't. It also requires that you know which classes are thread safe and which arent, which is hopefully documented somewhere. Thread groups need to be understood (I work in Java/Kotlin every day, and I didn't know what thread groups were before today). In Erlang, data corruption due to multiple processes doesn't happen, and grouping processes together (supervision trees) is so common I can't remember the last time I saw an Erlang program without one.

Which of course doesn't mean that Erlang is superior to Java. But when you're working on something highly concurrent which needs to be fault tolerant, I'd argue that you'd get a better result with less effort than in Java. But of course, if you know Java really well and don't know Erlang at all, YMMW.

Different strokes I guess.

Erlang's model with fibers and message passing sounds close to Golang. Java has decent support for immutable objects with immutable collections, Lombok, the FreeBuilder library, both build-time code generators, and Java 14 record types. Automatic passing between machines is unique to Erlang

Per process GC isn't anything like Java does, but the new GC's are probably fast enough that it doesn't matter in practice. For any sane sized heaps the GC pauses are around 0.5 millisecond. This wasn't true until a few years ago, and in production most people don't know or care enough to use the new GC's.

You are right about thread safety in objects. Thankfully the JDK surface is fully documented. Third party libraries usually are. Internal code is a crapshoot. It requires discipline, but I still find it rare in practice because the normal patterns lend themselves to thread safety.

I think its safe to say that Java is a lower level language than Erlang which enables many of the same patterns with less convenience. You can probably get better performance with Java, but your fault tolerance completely depends on how good your coders are. Java will not save you from doing stupid things between threads.

Sounds about right :)

Just wanted to touch on one point. Golang also has shared memory, even though it encourages sharing by communicating. In Erlang you don't have a choice. Golang also doesn't have something like supervision trees (threads that die and restart together). So in practice golang and erlang concurrency is very different.

Interesting, so Golang channels are basically a hybrid between the two approaches.

I envy Rust and its borrow checker. Its a pain to get used to, but enables "shared when you say it is" concurrency model with no overhead. No message passing overhead, optional but safe mutability, no data corruption possibility, zero copy basically everywhere