Hacker News new | ask | show | jobs
by pron 143 days ago
The only "abstraction penalty" of "running on a VM" (by which I think you mean using a JIT compiler), is the warmup time of waiting for the JIT.
3 comments

The true penalty of Java is that product types have to be heap-allocated, as there is no mechanism for stack-allocated product types.
You're right that Java lacks inline types (although it's getting them really soon, now), but the main cost of that isn't because of stack allocation (because heap allocations in Java don't cost much more than stack allocations), but because cache misses due to objects not being inlined in arrays.
P.S.

Even for flattened types, the "abstraction penalty", or, more precisely, its converse, the "concreteness penalty", in Java will be low, as you don't directly pick when an object is flattened. Instead, you declare whether a class cares about identity or not, and if not, the compiler will transparently choose whether and when to flatten the object, depending on how it's used.

> product types have to be heap-allocated

Conceptually, that’s true, but a compiler is free to do things differently. For example, if escape analysis shows that an object allocated in a block never escapes the block, the optimizer can replace the object by local variables, one for each field in the object.

And that’s not theoretical. https://www.bettercodebytes.com/allocation-elimination-when-..., https://medium.com/@souvanik.saha/are-java-objects-always-cr... show that it (sometimes) does.

Its a statement of our times that this is getting down voted. JIT is so underrated.
in my opinion, this assertion suffers from the "sufficiently smart compiler" fallacy somewhat.

https://wiki.c2.com/?SufficientlySmartCompiler

No, Java's existing compiler is very good, and it generates as good code as you'd want. There is definitely still a cost due to objects not being inlined in arrays yet (this will change soon) that impacts some programs, but in practice Java performs more-or-less the same as C++.

In this case, however, it appears that the Java program may have been configured in a suboptimal way. I don't know how much of an impact it has here, but it can be very big.

Even benchmarks that allow for jit warmup consistently show java roughly half the speed of c/c++/rust. Is there something they are doing wrong? I've seen people write some really unusual java to eliminate all runtime allocations, but that was about latency, not throughput.
> Is there something they are doing wrong?

Yes. The most common issues are heap misconfiguration (which is more important in Java than any compiler configuration in other languages) and that the benchmarks don't simulate realistic workloads in terms of both memory usage and concurrency. Another big issue is that the effort put into the program is not the same. Low-level languages do allow you to get better performance than Java if you put significant extra work to get it. Java aims to be "the fastest" for a "normal" amount of effort at the expense of losing some control that could translate to better performance in exchange for significantly more work, bot at initial development time, but especially during evolution/maintenance.

E.g. I know of a project at one of the world's top 5 software companies where they wanted to migrate a real Java program to C++ or Rust to get better performance (it was probably Rust because there's some people out there who really want to to try Rust). Unsurprisingly, they got significantly worse performance (probably because low-level languages are not good at memory management when concurrency is at play, or at concurrency in general). But they wanted the experiment to be a success, so they put in a tonne of effort - I'm talking many months - hand-optimising the code, and in the end they managed to match Java's performance or even exceed it by a bit (but admitted it was ultimately wasted effort).

If the performance of your Java program doesn't more-or-less match or even exceed the performance of a C++ (or other low level language) program then the cause is one of: 1. you've spent more effort optimising the other program, 2. you've misconfigured the Java program (probably a bad heap-size setting), or 3. the program relies on object flattening, which means the Java program will suffer from costly cache misses (until Valhalla arrives, which is expected to be very soon).

In my experience, if your C++ or Rust code does not perform as well as Java, it's probably because you are trying to write Java in C++ or Rust. Java can handle a large number of small heap-allocated objects shared between threads really well. You can't reasonably expect to meet its performance in such workloads with the rudimentary tools provided by the C++ or Rust standard library. If you want performance, you have structure the C++/Rust program in a fundamentally different way.

I was not familiar with the term "object flattening", but apparently it just means storing data by value inside a struct. But data layout is exactly the thing you should be thinking about when you are trying to write performant code. As a first approximation, performance means taking advantage of throughput and avoiding latency, and low-level languages give you more tools for that. If you get the layout right, efficient code should be easy to write. Optimization is sometimes necessary, but it's often not very cost-effective, and it can't save you from poor design.

> it's probably because you are trying to write Java in C++ or Rust

Well, sure. In principle, we know that for every Java program there exists a C++ program that performs at least as well because HotSpot is such a program (i.e. the Java program itself can be seen as a C++ program with some data as input). The question is can you match Java's performance without significantly increasing the cost of development and especially evolution in a way that makes the tradeoff worthwhile? That is quite hard to do, and gets harder and harder the bigger the program gets.

> I was not familiar with the term "object flattening", but apparently it just means storing data by value inside a struct. But data layout is exactly the thing you should be thinking about when you are trying to write performant code.

Of course, but that's why Java is getting flattened objects.

> As a first approximation, performance means taking advantage of throughput and avoiding latency, and low-level languages give you more tools for that

Only at the margins. These benefits are small and they're getting smaller. More significant performance benefits can only be had if virtually all objects in the program have very regular lifetimes - in other words, can be allocated in arenas - which is why I think it's Zig that's particularly suited to squeezing out the last drops of performance that are still left on the table.

Other than that, there's not much left to gain in performance (at least after Java gets flattened objects), which is why the use of low-level languages has been shrinking for a couple of decades now and continues to shrink. Perhaps it would change when AI agents can actually code everything, but then they might as well be programming in machine code.

What low-level languages really give you through better hardware control is not performance, but the ability to target very restricted environments with not much memory (as one of Java's greatest performance tricks is the ability to convert RAM to CPU savings on memory management) assuming you're willing to put in the effort. They're also useful, for that reason, for things that are supposed to sit in the background, such as kernels and drivers.

> Java in C++ or Rust.

This critic always forgets that Java is how most folks used to program in C++ARM, 100% of all the 1990's GUI frameworks written in C++, and that the GoF book used C++ and Smalltalk, predating Java for a couple of years.

Has anyone done a fork of the benchmark game or plb2 to demonstrate the impacts of jit warmup and heap settings?
I don't know what plb2 is, but the benchmark game can demonstrate very little for because, the benchmarks are small and uninteresting compared to real programs (I believe there's not a single one with concurrency, plus there's no measure of effort in such small programs) and they compares different algorithms against each other.

For example, what can you learn from the Java vs. C++ comparison? In 7 out of 10 benchmarks there's no clear winner (the programs in one language aren't faster than all programs in the other) and what can you generalise from the 3 where C++ wins? There just isn't much signal there in the first place.

The Techempower benchmarks explore workloads that are probably more interesting, but they also compare apples to oranges, and like with the benchmark game, the only conclusion you could conceivably generalise (in an age of optimising compilers, CPU caches, and machine-learning banch predictors, all affected by context) is that C++ (or Rust) and Java are about the same, as there are no benchmarks in which all C++ or Rust frameworks are faster than all Java ones or vice-versa, so there's no way of telling whether there is some language advantage or particular optimisation work done that helps a specific benchmark (you could try looking at variances, but given the lack of a rigorous comparison, that's probably also meaningless). The differences there are obviously within the level of noise.

Companies that care about and understand performance pick languages based on their own experience and experiments, hopefully ones that are tailored to their particular program types and workloads.

The linked article makes a specific carveout for Java, on the grounds that its SufficientlySmartCompiler is real, not hypothetical.
c++ certainly also has and needs a similarly sufficiently smart compiler to be compiled at all…