Hacker News new | ask | show | jobs
by cbsmith 1312 days ago
IMHO, implicit allocations is a bit of a red herring. Yes, in C/C++ heap allocations are proportionately pretty expensive, but I've seen Java programs have just ridiculous amounts of implicit allocations but there really isn't much of a problem.

But allocations aren't the same as copies, and the argument for reference semantics has always been that implicit copies are problematic. In your std::string example, having that many String copies in a Java program would be similarly terrible (and this sometimes happens by accident because of abstraction layers that hide all the copying going on under the covers).

I do think Rust gets a lot of stuff right, but Rust's cognitive load is broadly recognized. I tend to see it as C++ with a lot fewer foot guns. ;-)

3 comments

> Yes, in C/C++ heap allocations are proportionately pretty expensive, but I've seen Java programs have just ridiculous amounts of implicit allocations but there really isn't much of a problem.

Java programs make "ridiculous amounts of implicit allocations" because allocations are cheap in Java. And they need to be cheap because Java doesn't have value semantics so it leans hard on escape analysis + cheap allocations.

I agree with the rest of your comment, although I think most of Rust's "cognitive load" amounts to borrow-checker-vs-garbage-collection. You could envision a Rust with explicit allocations and a GC, and that language would have a "cognitive load" approaching that of Go while also being a fair bit more performant insofar as people can much more easily reason about allocations and thus performance.

> Java programs make "ridiculous amounts of implicit allocations" because allocations are cheap in Java. And they need to be cheap because Java doesn't have value semantics so it leans hard on escape analysis + cheap allocations.

Yes, but that's kind of the point, right? Implicit allocation isn't really a problem because a runtime that optimizes the allocations magically for you is a lot easier to build than a runtime that optimizes whether you really need to be copying objects as much as you do.

> Implicit allocation isn't really a problem because a runtime that optimizes the allocations magically for you is a lot easier to build

As far as I know, Java's (default) runtime gives cheap allocations at the cost of long GC pause times.

> than a runtime that optimizes whether you really need to be copying objects as much as you do

It's not "copying", it's "allocating", and avoiding allocations isn't that much work (and frankly I'm surprised it's such a minor problem that no one has bothered to build an IDE plugin that highlights these allocation points automatically--or at least I haven't heard of such a thing). Anyway, "a runtime that minimizes allocations" is just an escape analyzer and Java has one of these too, and IIRC it's a lot more sophisticated than Go's (but it's also a lot harder to reason about as a consequence).

> As far as I know, Java's (default) runtime gives cheap allocations at the cost of long GC pause times.

"long GC pause times" is kind of vague, so I guess you could be correct, but in practice there's a LOT of different ways the memory management can be handled, many of which are deemed "pauseless GC" (though the term is somewhat misleading).

My statement was considering that reality though. While not true for some use cases, in the vast majority of cases, the runtime optimizes the allocations more than sufficiently.

> It's not "copying", it's "allocating"

Allocators can do a pretty good job of minimizing the overhead of allocation, to the point the amortized cost isn't much more than a single machine instruction. Allocating gigabytes of memory quickly is possible. Copying the data can be a lot more work, and often objects have copy semantics that add a lot more additional work.

> Anyway, "a runtime that minimizes allocations" is just an escape analyzer and Java has one of these too, and IIRC it's a lot more sophisticated than Go's (but it's also a lot harder to reason about as a consequence).

I think you're implicitly saying "a runtime that minimizes heap allocations" there, in which case I'd agree.

> in practice there's a LOT of different ways the memory management can be handled, many of which are deemed "pauseless GC" (though the term is somewhat misleading).

Yes, but I'm pretty sure those "pauseless GC" schemes impose other tradeoffs.

> My statement was considering that reality though. While not true for some use cases, in the vast majority of cases, the runtime optimizes the allocations more than sufficiently.

I'm not sure I follow. The same could be said for Go--in the vast majority of cases, Go's tradeoffs (slow allocations, low latency / non-moving GC) are also suitable.

> Allocators can do a pretty good job of minimizing the overhead of allocation, to the point the amortized cost isn't much more than a single machine instruction.

As far as I know, speeding up allocations to this degree requires a moving GC which imposes a bunch of other constraints (including copying a bunch of memory around).

> Allocating gigabytes of memory quickly is possible. Copying the data can be a lot more work, and often objects have copy semantics that add a lot more additional work.

Yes, but the bottleneck here wasn't the copying, it was the allocations. And if you optimized away allocation cost entirely such that only the copy cost remained, that cost would be so small that the OP would never have bothered to profile because copying small objects like this is so cheap compared to everything else (even if it is expensive compared to bump allocating).

> I think you're implicitly saying "a runtime that minimizes heap allocations" there, in which case I'd agree.

Yes, the allocator and GC are concerned with heap allocations and not stack allocations. I'm using "allocations" as a shorthand for "heap allocations".

In hindsight, I think I chose how to present this poorly, because yes, in this case, the allocation is what is killing the performance. I look at it, and I just see unnecessary implied behaviour creating a performance problem. Usually it isn't the allocations themselves that kill you, but it certainly is the case here.
A long long time ago Rust was a GC language.
OCaml and Standard ML.
Allocations are as damaging as your free function is slow.

Java has a tremendously good GC, so can cope with lots of allocations. Go has an OK one, so needs some help (but mollifying it often pays dividends elsewhere in locality and memory usage too). C++ has your default system heap, good luck.

Historically Java has traded long pause times for fast allocations, although I'm of the impression that it has recently found a way to have its cake and eat it.
Java has been tunable for a long time. Periodically, the recommended tuning changes, or new GC algorithms become available, etc. But it has long been possible to get short pause times with various combinations of choosing the right algorithm and writing your program the right way.

I think what really throws people off here is that getting good performance out of a Java application involves some skills which are alien to C++ programmers, and vice versa. You take an experienced C++ programmer and drop them into a Java codebase, they may have a very poor sense of what is expensive and what is cheap. Vice versa… experienced Java programmers don’t do well in C++ either.

The result is that you have precious few people with any significant, real-world experience fixing performance issues in both languages.

Agreed, but usually tuning for short pause times involves trading off throughput or allocation performance. But at the end of the day, if you aren't allocating a bunch of garbage in the first place, then you don't need to be as concerned about the costs of allocating or cleaning up the garbage. I wish Go did more to make allocations explicit so they could be more easily recognized and avoided; I dislike Java's approach of making allocations even more implicit/idiomatic while trying to fix the cost problem in the runtime (although I admire the ambition).
Parallel garbage collection in Java has been a thing for a long time. With the right tuning you might have very infrequent STW GCs, even in a game as allocation-heavy as Minecraft.
What do you consider a ‘long’ pause time?

I’ve had no issues with Java 17+ under heavy allocation/garbage collection (data encryption pipeline I haven’t tuned to reuse buffers yet), and it’s pause times are on the order of a handful of milliseconds, without meaningful tuning. I think it’s doing something like a GB/s of garbage collection.

And the jvm in question is doing a LOT more than just this, so it’s coping with millions of allocated objects at the same time.

I consider tens of milliseconds to be a long pause time (P99 should be <10ms), which is what I understand to be the ballpark for G1 (I haven't tested 17+). No doubt JVM is a fine piece of engineering, but I prefer Go's approach of just not allocating as much to begin with (Java's cheap allocations require a moving GC, which introduces challenges with respect to rewriting pointers and so on, which in turn introduces constraints for FFI and makes the whole system more complex). I wish it went further and made allocations more explicit.
I’m not sure I get what you mean. You wouldn’t have that many String copies in Java by passing an unchanged String down the call stack. My point is that it’s too easy to make this mistake in C++.
In Java, the mistake happens only when there's an abstraction that hides the copying from you, so it isn't implicit in the same way, but it's still implicit.