Hacker News new | ask | show | jobs
by stcredzero 3397 days ago
Java is showing quite impressive numbers! 50% overhead over native C implementations was often cited as a good guess for the ultimate efficiency of JIT code generation back in the Self Hotspot days.

People who were trying to castigate Go early on as having "Java-like speeds" were really just showing their ignorance of the state of the art of JIT compilation for managed languages and the JVM. Such outdated folk knowledge of performance in the programming field seems to be a constant over the decades. (Programmers have had such distorted views since the mid 80's at least.) Maybe this kind of knowledge needs to be a used in job interview questions for awhile? Very soon, people will just memorize such trivia for interviews, but it would serve to squash this form of folk programming "alternative fact."

5 comments

HotSpot has had great speeds for numeric computation at least since 2005. I was doing financial software in Java in my first job out of college, our CTO was an ex-Sun architect who literally wrote the book on Java, and the speeds we got on numerical computations were basically equivalent to C.

The part where Java really falls down is in memory use & management, which you can see on the binary-tree & mandelbrot benchmarks, where it's roughly 4x slower than C. There are inherent penalties to pointer chasing that you can't get around. While HotSpot is often (amazingly) smart enough to inline & stack-allocate small private structs, typical Java coding style relies on complex object graphs. In C++ or Rust these would all have well-defined object ownership and be contained within a single block of memory, so access is just "add a constant to this pointer, and load". In Java, you often need to trace a graph of pointers 4-5 levels deep, each of which may cause a cache miss.

Rule of thumb while I was at Google was to figure on real-world Java being about 2-3x slower than real-world C++.

> The part where Java really falls down is in memory use & management, which you can see on the binary-tree & mandelbrot benchmarks, where it's roughly 4x slower than C.

binary-tree is not useful for comparing GCed and non-GCed languages. For non-GCed languages, you are allowed to use a memory pool of your choice (the C version uses the Apache Portable Runtime library), for GCed languages you are required to use the standard GC with the default settings (no adjustment of GC parameters permitted). This is apples and oranges.

For mandelbrot, the C version uses handcoded SIMD intrinsics. I.e. it's not even portable to non-x86 processors.

> For non-GCed languages, you are allowed to use a memory pool of your choice (the C version uses the Apache Portable Runtime library), for GCed languages you are required to use the standard GC with the default settings (no adjustment of GC parameters permitted). This is apples and oranges.

Doesn't that match with how a library would be used in the real world? A c library can create it's own memory pool but a GC one has to live with however it's host is configured.

If I were to run a performance-critical application, I'd definitely tune the GC accordingly. It's why the JVM offers several garbage collectors in the first place, for example.

Also, GCed languages aren't prevented from using memory pools, but often they are not part of their common libraries, because there's less need for them.

> If I were to run a performance-critical application, I'd definitely tune the GC accordingly.

But you have to tune it for the performance of the whole application (AFAIK), you can't tune it for an individual algorithm like you can with c. It's a one size fits all approach.

1. That goes towards the other point that I made [1] about how microbenchmarks have only limited relevance for the performance of large applications (the performance of memory pools can also change as a result; as an extreme case, multiple large memory pools can lead to swapping).

2. Many GCs allow you to tune performance for individual computations. For example, Erlang allows you to basically start a new lightweight process with a heap large enough so that collection isn't needed and to throw it away at the end; OCaml's GC parameters can be changed while the program is running.

[1] https://news.ycombinator.com/item?id=13747876

Probably because people are intelligent enough not to compare speeds inside of a vacuum. When someone denigrates a language as "java-like" they're really just comparing it anecdotally to the sum of all Java projects they've worked with. Rarely is the project a single-purpose, optimized pet-project.
Smalltalk was long castigated for being a "slow, poky interpreted language" long, long after it stopped being that in fact. In all of my time as a consultant for the language vendor, never did I ever come across the VM actually being too slow. In something like 90% of the cases, it was due to IO.

Before I left the Smalltalk part of my career behind, someone had the occasion to compare the parser-compiler of one Smalltalk which was implemented in C with Yacc/Lex with one implemented in pure Smalltalk with a JIT VM. IT turns out, once the console logging was disabled, the JIT VM's parser was just as fast as the one in C.

In my experience of almost 2 decades, it has been a constant that uninformed programmers are especially uninformed about the relative performance of managed languages.

If only someone was interested in contributing Smalltalk programs written with MatriX to use quad-core --

http://benchmarksgame.alioth.debian.org/u64q/smalltalk.html

Note that in many of the other benchmarks, the fastest Java program takes 2x to 4x longer than the fastest C program. Still not bad! But knucleotide shows Java in a better light than most:

http://benchmarksgame.alioth.debian.org/u64q/java.html

(And as always, it's possible that someone can write a much faster Java program than the ones submitted so far.)

And in many cases it doesn't matter, those 2x, 4x longer are still in the accepted expectation time frame.
There are lots of folk programming like the impact of bounds checking or that all game consoles except for XBox use OpenGL.

Also many younger developers believe that C was always fast, and are unaware that early 8 and 16 bit compilers for home computers were like managed languages. The compilers generated way worse code than hobby Assembly developers.

> early 8 and 16 bit compilers for home computers were like managed languages

Yeah, I wrote several published games entirely in assembly back when that was really the only reasonable option.

Of course some of the things we had to deal with meant that even the compilers were good, they couldn't have been good enough.

Extreme limited memory is the obvious one, but pages that need to be swapped out at runtime is the other one. An example: The Game Boy had 64k of memory addressing, but would ship with 128k and larger cartridges. 32k of the memory space was reserved for video memory, RAM, and hardware registers (if I remember correctly). The first 16k of memory was always mapped to the first 16k of ROM, but the second 16k of memory was mapped to arbitrary 16k blocks of ROM. 16k wasn't enough space to hold your entire game, so some code would leak into other pages (hopefully not more than ONE other page), but you also had to be able to swap other ROM pages into that second 16k memory region to, e.g., load graphics and game data.

There isn't a compiler around even today that can juggle all of that automatically. At a minimum you'd need to be marking different functions as belonging to different memory regions, but you'd still have to manually keep track of which function was where and ensure you don't try to call a region-2 function from a region-1 function when region-2 is some arbitrary ROM page instead of the extra code page. But often something in region-2 needs to call region-1 to load graphics into sprite registers and then restore region-2 so that the stack frame becomes valid again. :)

And 32k is SUCH a small amount of memory that being able to chip away at every single function was important. You have a function that does two things, but sometimes you just need to do the second? Put a label halfway through and call directly into the middle. You can save a byte here by loading one register into another, because you know that the registers will have the right values? Or you can save another byte there because you've realized that a particular constant is loaded a lot, and you can store that constant into one of the 128 cheap-memory locations? Do it! We need every byte...

Yes it would be possible to write such a compiler, but the level of effort to customize it to the specific architecture would be extreme. Easier to just make programmers do the hard work.

Nice tricks, it brings back memories from demoscene days!

I was just arguing about plain boring C code. :)

>> Java is showing quite impressive numbers!

359% more RAM isn't very impressive. Even less so when one considers the 20+ years of effort spent to achieve it.

You know what impresses me? A 22 month old language besting everything else while guaranteeing no segfaults or NPEs at compile time. That's impressive.

Not that I don't disagree with the general idea of this post, but it's worth pointing out that while Rust is fairly young based on the 1.0 release date, there was a large amount of time prior to that where the language underwent a number of changes.

It's also worth remembering that this is just a game, and as such it's somewhat of an apples-to-oranges comparison. Java's RAM usage may not be impressive compared to C, but that doesn't make the results overall any less impressive, especially knowing what it was like before those 20+ years of effort.

I don't see a reason that both Rust and Java's results can't be impressive in their own right. Java's numbers are (for the most part) impressive compared to C and Rust's numbers are also impressive compared to C, just in a different way.

>> while Rust is fairly young based on the 1.0 release date, there was a large amount of time prior to that where the language underwent a number of changes.

Both Rust and Java had similar intervals between start of development and 1.0 release, which is what I carefully referenced my claims to. The comparison is fair; deliberately conservative actually given Java 1.0 in 1995 (now 22 years ago.)

>> I don't see a reason that both Rust and Java's results can't be impressive in their own right.

The state of the art has moved on. There was a time when Java pulling to within 50-ish percent of a 45 year old programming language was impressive; back around 2005 or so. It's old hat now and there is little evidence the gap is going to close much further.