Hacker News new | ask | show | jobs
by throwaway894345 1317 days ago
I also have a background in C/C++, etc and I've only ever found myself missing value semantics when I use languages with implicit reference semantics. I guess I always figured the solution was "value semantics with better education / tooling". Education: people should understand value semantics. Tooling: imagine an IDE that highlights allocation points automatically (or perhaps the problem is implicit allocations rather than value semantics?).
5 comments

> perhaps the problem is implicit allocations rather than value semantics?

I think that’s true. Expensive copies should never have been implicit. There was a story some time ago about a single keypress in the address bar of Chrome causing thousands of memory allocations. The culprit: lots of std::string arguments up and down the call stack.

Rust gets this right, with the hindsight of C++’s example: “a = b” is a move operation by default and clone() is always explicit, except for plain data types where copying is literally memcpy — and those are clearly marked as such by the type system.

IMHO, implicit allocations is a bit of a red herring. Yes, in C/C++ heap allocations are proportionately pretty expensive, but I've seen Java programs have just ridiculous amounts of implicit allocations but there really isn't much of a problem.

But allocations aren't the same as copies, and the argument for reference semantics has always been that implicit copies are problematic. In your std::string example, having that many String copies in a Java program would be similarly terrible (and this sometimes happens by accident because of abstraction layers that hide all the copying going on under the covers).

I do think Rust gets a lot of stuff right, but Rust's cognitive load is broadly recognized. I tend to see it as C++ with a lot fewer foot guns. ;-)

> Yes, in C/C++ heap allocations are proportionately pretty expensive, but I've seen Java programs have just ridiculous amounts of implicit allocations but there really isn't much of a problem.

Java programs make "ridiculous amounts of implicit allocations" because allocations are cheap in Java. And they need to be cheap because Java doesn't have value semantics so it leans hard on escape analysis + cheap allocations.

I agree with the rest of your comment, although I think most of Rust's "cognitive load" amounts to borrow-checker-vs-garbage-collection. You could envision a Rust with explicit allocations and a GC, and that language would have a "cognitive load" approaching that of Go while also being a fair bit more performant insofar as people can much more easily reason about allocations and thus performance.

> Java programs make "ridiculous amounts of implicit allocations" because allocations are cheap in Java. And they need to be cheap because Java doesn't have value semantics so it leans hard on escape analysis + cheap allocations.

Yes, but that's kind of the point, right? Implicit allocation isn't really a problem because a runtime that optimizes the allocations magically for you is a lot easier to build than a runtime that optimizes whether you really need to be copying objects as much as you do.

> Implicit allocation isn't really a problem because a runtime that optimizes the allocations magically for you is a lot easier to build

As far as I know, Java's (default) runtime gives cheap allocations at the cost of long GC pause times.

> than a runtime that optimizes whether you really need to be copying objects as much as you do

It's not "copying", it's "allocating", and avoiding allocations isn't that much work (and frankly I'm surprised it's such a minor problem that no one has bothered to build an IDE plugin that highlights these allocation points automatically--or at least I haven't heard of such a thing). Anyway, "a runtime that minimizes allocations" is just an escape analyzer and Java has one of these too, and IIRC it's a lot more sophisticated than Go's (but it's also a lot harder to reason about as a consequence).

> As far as I know, Java's (default) runtime gives cheap allocations at the cost of long GC pause times.

"long GC pause times" is kind of vague, so I guess you could be correct, but in practice there's a LOT of different ways the memory management can be handled, many of which are deemed "pauseless GC" (though the term is somewhat misleading).

My statement was considering that reality though. While not true for some use cases, in the vast majority of cases, the runtime optimizes the allocations more than sufficiently.

> It's not "copying", it's "allocating"

Allocators can do a pretty good job of minimizing the overhead of allocation, to the point the amortized cost isn't much more than a single machine instruction. Allocating gigabytes of memory quickly is possible. Copying the data can be a lot more work, and often objects have copy semantics that add a lot more additional work.

> Anyway, "a runtime that minimizes allocations" is just an escape analyzer and Java has one of these too, and IIRC it's a lot more sophisticated than Go's (but it's also a lot harder to reason about as a consequence).

I think you're implicitly saying "a runtime that minimizes heap allocations" there, in which case I'd agree.

A long long time ago Rust was a GC language.
OCaml and Standard ML.
Allocations are as damaging as your free function is slow.

Java has a tremendously good GC, so can cope with lots of allocations. Go has an OK one, so needs some help (but mollifying it often pays dividends elsewhere in locality and memory usage too). C++ has your default system heap, good luck.

Historically Java has traded long pause times for fast allocations, although I'm of the impression that it has recently found a way to have its cake and eat it.
Java has been tunable for a long time. Periodically, the recommended tuning changes, or new GC algorithms become available, etc. But it has long been possible to get short pause times with various combinations of choosing the right algorithm and writing your program the right way.

I think what really throws people off here is that getting good performance out of a Java application involves some skills which are alien to C++ programmers, and vice versa. You take an experienced C++ programmer and drop them into a Java codebase, they may have a very poor sense of what is expensive and what is cheap. Vice versa… experienced Java programmers don’t do well in C++ either.

The result is that you have precious few people with any significant, real-world experience fixing performance issues in both languages.

Agreed, but usually tuning for short pause times involves trading off throughput or allocation performance. But at the end of the day, if you aren't allocating a bunch of garbage in the first place, then you don't need to be as concerned about the costs of allocating or cleaning up the garbage. I wish Go did more to make allocations explicit so they could be more easily recognized and avoided; I dislike Java's approach of making allocations even more implicit/idiomatic while trying to fix the cost problem in the runtime (although I admire the ambition).
I’m not sure I get what you mean. You wouldn’t have that many String copies in Java by passing an unchanged String down the call stack. My point is that it’s too easy to make this mistake in C++.
In Java, the mistake happens only when there's an abstraction that hides the copying from you, so it isn't implicit in the same way, but it's still implicit.
> Rust gets this right, with the hindsight of C++’s example: “a = b” is a move operation by default and clone() is always explicit

Note that a move can still do a copy; in fact, Rust is kinda notorious for generating more on-stack memory copy operations than C++. It’s slowly improving, but it can still be surprisingly bad in some cases.

> except for plain data types where copying is literally memcpy

what do you mean by this? If I say `let x = 5; let y = x;` in rust, that's a "plain data type copy" of a stack value, but memcpy is usually used to copy heap memory. What connection between copying of primitive simple stack values and memcpy are you suggesting here?

The compiler can optimize memcpy with a known size into a small number of move instructions so they are identical to copying stack values.

Try playing with memcpy on Godbolt and you'll find that the compiler will compile the memcpy to a single mov instruction when the size is small, and some movdqu/movups when the size is slightly large, and only a function call when the size is huge.

> memcpy is usually used to copy heap memory

memcpy is often used in low-level serialization / deserialization code since you can't just cast a buffer pointer to a uint32_t pointer and dereference that; the solution is memcpy between variables that are often both on the stack.

> What connection between copying of primitive simple stack values and memcpy are you suggesting here?

They're just using 'memcpy' as a shorthand for saying the bitpattern is blitted. Semantically, that's like a memcpy. The point is, there are no expensive allocations, nor do any embedded pointer fields need adjusted, etc.

Why do you think memcpy is normally used to copy heap memory? It's just a general bitwise copy from one location to the other.

I think the confusion here is that there isn't always a literal call to memcpy for copying small types like ints in the emitted code, but it's always doing something with the same effect and maybe sometimes using an actual memcpy (probably when copying arrays?).

Also something interesting is that memcpy is used for copying data between stack variables in C sometimes when you need to convert some type to another one without using a cast.

Allocating isn't "an expensive copy"; it's not analogous to clone() in Rust. The copy isn't the problem, it's the allocation.
I'd argue quite the reverse. Allocation can be quite efficient if done properly, but copying involves a lot of other work.
I disagree--the bottleneck here is entirely the allocation. The copying is just a memcpy and it's very fast for small structs like this; like I said, it's not the same as a clone() in Rust, which is a deep copy. If you optimized the allocation away entirely (leaving only the copy cost), there wouldn't have been a significant performance problem and this blog would never have been written.
Actually, you'll find that in Rust, Box::new(stuff) will too often put stuff on the stack before copying it in the newly allocated memory. For large enough stuff, that can be slower than the allocation.
> I've only ever found myself missing value semantics when I use languages with implicit reference semantics.

Oh, I miss it every time. ;-)

I will say though that some newer languages seem to have a confused idea about how to offer mixed semantics. A bunch of them tie semantics to types. The ideal interface can vary by usage context. It's hard enough getting the semantics right as the callee (as opposed to caller), let alone when you're defining a type that will be used by who knows how many interfaces.

> I guess I always figured the solution was "value semantics with better education / tooling".

I've always thought much the same, but I have slowly come to appreciate that it's more than just education & tooling. Even with good education & tooling, there's a cognitive load that comes with getting interfaces right that for the general case is just not worth it.

I think this is half right. For anything 64 bits or smaller, value semantics are pretty much always going to be better. That said, being able to choose between value and reference semantics for larger objects per object is a pretty useful feature.
> For anything 64 bits or smaller, value semantics are pretty much always going to be better.

That's assuming a 64-bit CPU (which admittedly seems like a reasonable assumption. The nice thing about the abstraction though is that there's nothing preventing the runtime from applying value semantics for those trivial small-object cases where they're obviously more efficient.

Even for a 32-bit CPU a 64-bit type is only two words to copy - and in many cases those "copies" are just register loads. In contrast, reference types means to even access it you have to read the reference and then indirectly load the memory it points to. You have to really make something contrived where a two-word type ends up being more efficient as a reference than as a value.
> I will say though that some newer languages seem to have a confused idea about how to offer mixed semantics. A bunch of them tie semantics to types.

Curious about what you mean here. This sounds like C#'s class/struct distinction to me.

That's exactly the example I was thinking of.
Yeah, I never cared for that. Specifically, I'd prefer that everything was just "struct", but structs could implement interfaces, which is essentially the Go/Rust model.
>or perhaps the problem is implicit allocations rather than value semantics

To me, this sounds like this is it. Explicit is better than implicit is a very useful truism

The counter argument to the "explicit is better than implicit" is that abstraction & encapsulation are such significant force multipliers. If done properly, implicit is good. It's just that in case of copying, doing it "properly" is well nigh impossible.

          explicit  implicit

  good       *    <    *
 
             v    v    v

  bad        *    >    *

Good implicit is better than good explicit. (If all is good, go for implicit.)

Bad explicit is better than bad implicit. (If all is bad, go for explicit; don't hide bad explicit with bad implicit.)

Good explicit or implicit is better than bad explicit or implicit.

> Tooling: imagine an IDE that highlights allocation points automatically

Rider does this already for C#.

The JetBrains IDEs can do this, at least for .NET
VS did it first with Roslyn plugins.