Hacker News new | ask | show | jobs
by StefanKarpinski 1910 days ago
I mean I’m not trying to hate on Java — pointer-heavy programming was all the rage when it was designed, and GC was a hot research topic, so there was good reason to be optimistic about that approach. But it turns out that it’s very hard to make up for generating tons of garbage and pointer-heavy programming hasn’t aged well given the way hardware has evolved (pointers are large and indirection is expensive).
3 comments

You're mixing two things here: memory management and memory layout. This "pointer-heavy programming" is, indeed, a bad fit for modern hardware in terms of processing speed due to cache misses, which is why even Java is now getting user-defined primitive types (aka inline types, aka value types), but in terms of memory management, in recent versions OpenJDK is pretty spectacular, not only in throughput but also latency (ZGC in JDK 16 has sub millisecond maximum pause time for any size of heap and up to a very respectable allocation rate: https://malloc.se/blog/zgc-jdk16 and both throughput and max allocation rate are expected to grow drastically in the coming year with ZGC becoming generational). As far as performance is concerned, GC can now be considered a solved problem (albeit one that requires a complex implementation); the only real price you pay is in footprint overhead.
I'm not — memory layout and memory management are (fairly obviously, I would think) intimately related. In particular, pointer-heavy memory layouts put way more stress on the garbage collector. Java's choice of making objects mutable, subtypeable and have reference semantics, basically forces them to be individually heap-allocated and accessed via pointers. On the other hand, if you design your language so that you can avoid heap allocating lots of individual objects, then you can get away with a much simpler garbage collector. Java only needs spectacular GC technology because the language is designed in such a way that it generates a spectacular amount of garbage.
I would say no. To have stellar performance, you'll need compaction, you'll need parallelism (of GC threads), and you'll need concurrency between the GC threads and mutator threads; and for good throughput/footprint tradeoff you'll need generational collection. True, you might not need to contend with allocation rates that are that high, but getting, say, concurrent compaction (as in ZGC) and/or partial collections (as in G1), requires a sophisticated GC. E.g. Go isn't as pointer-heavy as (pre-Valhalla) Java, and its GC is simple and offers very good latency, but it doesn't compact and it throttles, leading to lower throughput (I mean total program sluggishness) than you'd see in Java, even with a much higher allocation rate. The thing is that even with a low allocation rate, you'd get some challenging heaps, only later, say, every 10 seconds instead of every 5.

It's true that a simpler GC might get you acceptable performance for your requirements if your allocation rate is relatively low, but you still won't get OpenJDK performance. So I'd say that if you design your language to require fewer objects, then you can get by with a simple GC if your performance requirements aren't too demanding.

All that dereferencing puts a higher load on data structure traversal (which is why Java is getting "flattenable" types) than on the GC. The main reason for Java's particular GC challenges isn't its pointer-heavy (pre-Valhalla) design but the mere fact that it is the GCed platform that sees the heaviest workloads and most challenging requirements by far. Java's GC needs to work hard mostly for the simple reason that Java is asked to do a lot (and the better some automated mechanism works, the more people push it).

Go has only concurrency of the features you talk about but is often competitive with Java in benchmarks. In my experience I’ll take administration of go processes any day of the week over Java — I’ve lost count of the number of hours lost to debugging gc stalls, runaway heaps and other garbage collector related issues never once had those problems in go.

Go even reverted a generational collector because it had no performance benefits since most generational objects would be stack allocated anyway — Julia’s JIT and way more advanced llvm backend should do even better than go in keeping objects stack local and inline.

It's competitive in pretty forgiving benchmarks. And LLVM is way more advanced than Go's compiler, but not OpenJDK's. I'm not saying you have to prefer Java to Go, but its throughput is better. As to the stack-allocation claim, young generations might be hundreds of MBs; that might correspond to the stacks of 100K goroutines on some server workloads, but not of a few threads.

So I'm not saying you must prefer Java to Go (even though GC tuning is a thing of the past as of JDK 15 or 16), or that Go's performance isn't adequate for many reasonable workloads, only that 1. a flatter object landscape might still not match Java's memory management performance without sophisticated GCs, and 2. I wouldn't extrapolate from Go to Julia, as they are languages targeting very different workloads. E.g. Julia might well prefer higher throughput over lower latency, and Go's GC's throughput is not great.

Having a Lamborghini racing a Toyota Corolla is of course going to show the Lambo winning. But if I need to maintain a fleet of them to move 1000 passengers around a city with certain availability guarantees, I'm going with the Toyotas every time.
Hating on Java seems perfectly reasonable to me.
so are you for GC or against GC?

In other posts you actually argue that GCs help you reduce complexity because manual memory management is too much of a hassle.

May be immutable is not the correct term - persistent data structures is what I like support for: that is my use-case.

I think you can have efficient persistent data structures without a GC, but that requires fast reference counting and in turn, that requires a lot of work to be competitive with the JVM.

I also understand that my use-case is not Julia's focus. That's perfectly fine.

That's a major oversimplification. GC is good for ease of use and safety of a high level language. GC is never as performant as not requiring heap allocations at all. Julia has a GC, but also provides a lot of tools to avoid needing the GC in high performance computations. This combination gives ease of use and performance.

Java sacrifices some performance for having this "one paradigm" of all objects, and then heavily invested in the GC, but in many cases like writing a BLAS it still just will not give performance exactly matching a highly tuned code, where as in Julia for example you can write really fast BLAS codes like Octavian.jl.

Julia is multi-paradigm in a way that is purposely designed for how these features compose. I think it's important to appreciate that design choice, in both its pros and cons.

To tie Octavian.jl into this memory allocation discussion:

Octavian uses stack-allocated temporaries when "packing" left matrix ("A" in "A*B"). These temporaries can have tens of thousands of elements, so that's a non-trivial stack allocation (the memory is mutable to boot). No heap allocations or GC activity needed (just a GC.@preserve to mark its lifetime). If I understand correctly, this isn't something that'd be possible in Java?

To be fair, you can also just use preallocated global memory for your temporaries, since the maximum amount of memory needed is known ahead of time.

I don't know that the object model is why writing a BLAS in Java doesn't make sense. After all they special case `float` and `double` as primitives, which bifurcates the whole type system and is its own whole issue, but means that you can store them efficiently inline. I'm actually not sure what stops someone from writing a BLAS in Java except that it would be hard and there's no point.
I like your response, and yes, it was a major oversimplification and I'm sorry for that.

Indeed, it is always about design choices and trade-offs. I can see why BLAS code is important and why Julia is an optimal choice for computation heavy problems.

I love GC — it solves a ton of nasty problems in a programming language design with a single feature that users mostly don't have to worry about. Just because you have a GC, however, doesn't mean that it's a good idea to generate as much garbage as you can — garbage collection isn't free. That's where Java IMO went wrong. Java's design — objects are and subtypeable (by default) and mutable with reference semantics — generates an absolute epic amount of garbage. It seems like the hope was that improvements in GC technology would make this a non-issue in the future, but we're in the future and it hasn't turned out that way: even with vast amounts of money that have been spent on JVM GCs, garbage is still often an issue in Java. And this has given GC in general a bad name IMO quite unfairly. It just happens that Java simultaneously popularized GC and gave it a bad name by having a design that made it virtually impossible for the GC to keep up with the amount of garbage that was generated.

It is entirely possible to design a garbage collected language that doesn't generate so goddamned much garbage — and this works much, much better because a relatively simple GC can easily keep up. Julia and Go are good examples of this. Julia uses immutable types extensively and by default, while Go uses value semantics, which has a similar effect on garbage (but has other issues). With a language design that doesn't spew so much garbage, if you only care about throughput, a relatively simple generational mark-and-sweep collector is totally fine. This is what Julia has. If you also want to minimize GC pause latency, then you need to get fancier like Go (I think they have a concurrent collector that can be paused when it's time slice is up and resumed later).

Persistent data structures are a whole different question that I haven't really spent much time thinking about. Clojure seems to be the state of the art there but I have no idea if that's because of the JVM or despite it.

> If you also want to minimize GC pause latency, then you need to get fancier like Go (I think they have a concurrent collector that can be paused when it's time slice is up and resumed later).

How possible would it be for Julia to add this? I keep thinking Julia would be great for graphical environments and gaming, but high GC latency won't work there.

Very doable, “just” a bunch of moderately tricky compiler work. Will happen at some point. Things that would make it happen sooner: someone interested in compiler work decides to do it; some company decides to fund it.
Thanks for the reply!

Unfortunately, persistent data structures tend to produce (short-lived) garbage which the JVM is very good at collecting!

So yes, Clojure benefits immensely from the JVM.

It is also an interesting research topic whether (optimised) reference counting would be a better approach.

Regarding objects, there is also a "middle ground" to consider:

Split big (immutable) arrays in smaller ones, connect them with some pointers in between, and you are still cache friendly.

Also, you can do a lot on the application level to reduce garbage, and most Java programmers don't care for that exactly because of JVM.

About the refcounting approach, you may want to look at the Perceus paper. It's refcounting with dynamic reuse of memory that isn't shared (like a sort of runtime linear typing), and it's used in Koka for functional programming.
> garbage is still often an issue in Java.

Not anymore. That future is here. Java is getting "flattenable" types not because of GC, but because of iteration.

Has a version of Java with value types been released?
No, but there's finally a JEP, which normally means that release is imminent (I'm not involved with that project, so I have no inside information): https://openjdk.java.net/jeps/401

As part of this change, existing built-in primitive types (like int or double) will retroactively become instances of these more general objects.