A “complete”—i.e. functioning—tracing GC is a weekend project. (Mark-sweep, mark-compact, or stop-and-copy, take your pick.) Perhaps not as simple as basic unoptimized reference counting, but still not hard.
The hard part, the one that has occupied JVM engineers for almost three decades now, comes afterwards: when you try to make things not freeze when memory is low, or when you have multiple threads mutating the same heap, or ultimately when you’re adapting the GC to the particulars of your language. (E.g. Haskell has an awesome concurrent GC that’d work like crap for Java, because it assumes tons of really short-lived, really small garbage and almost no mutation. The other way around is also bound to be problematic—I don’t know how the Scala people do it.)
So a GC being tracing and not refcounting is not really a useful benchmark. And Go’s GC is undeniably less advanced than OpenJDK’s, simply because almost every other GC is. It can still suit Go’s purposes, but it does mean running Java on top of it is bound to yield interesting results.
(And can we please stop pretending C and C++ are in any way close as languages? Even if the latter reuses some parts from the former’s runtime.)
> Go’s GC is undeniably less advanced than OpenJDK’s
Java relies very heavily on its GC and tends to generate a lot more short lived objects which need collection than Go. Go's approach to memory management learns from this and focuses on creating fewer short-lived memory objects and providing much shorter GC pauses than Java. It's definitely less complex than Java's GC but it's also very performant and a lot less trouble than Java's GC in my experience.
> E.g. Haskell has an awesome concurrent GC that’d work like crap for Java, because it assumes tons of really short-lived, really small garbage and almost no mutation. The other way around is also bound to be problematic—I don’t know how the Scala people do it
I don't know a ton about Haskell's GC, but at surface level it seems very similar to several of the JVM GC implementations - a generational GC with a concept of a nursery. Java GC is very heavily designed around the weak generational hypothesis (ie, most objects don't live long) and very much optimizes for short-lived object lifecycles, so most GC implementations have at least a few nursery-type areas before anything gets to the main heap where GC is incredibly cheap, plus some stuff ends up getting allocated on the stack in some cases.
The only big difference is that in Haskell there are probably some optimizations you can do if most of your structures are immutable since nothing in an older generation can refer to something in the nursery. But it isn't super clear to me that alone makes a big enough difference?
One major simplification you can make is that due to purity, older values _never_ point to newer values. This means when doing generational GC, you don’t have to check for pointers from older generations into newer generations.
This feels wrong. Specifically, doesn't laziness bite you in this scenario? If I make a stream that is realized over GC runs, I would expect that a node in an old generation could point to a realized data element from a newer generation. Why not?
> Sure, you're saying that it won't be as performant.
I mean, I expect it won’t be, but that wasn’t really my point, no.
What I wanted to say is thet I expect the comparison to be interesting: I might not find Go’s particular brand of simplicity attractive, but I like simple designs in general, and Go’s GC is much less involved than OpenJDK’s one while still having received some tuning—it’s neither a weekend toy nor a multi-programmer-century monster. And it’d be interesting to see how much the simpler design really loses to the scariest monster of them all.
> And that' true. IDK if you noticed, but there's no JIT either.
That might have been interesting in a general comparison of Java VMs, but I’m concerned with GCs and in that light it’s not. It could be that a slow VM is so much slower that the GC difference gets lost in the noise, but given an actually bad GC situation can lock up the mutator for literal seconds I expect there will be a meaningful comparison independent of the rest of the VMs.
>> can we please stop pretending C and C++ are in any way close?
> If we also pretend we don't know why it was named C++.
Marketing gimmick? I’m absolutely fine ignoring people who try to suggest things which are not true through manipulative branding. I don’t feel guilty about that.
To be clear, there absolutely is C-ish C++ in the world, and even if it’s not a lot relatively speaking it’s still a lot of code just because of how much C++ there is overall. And if C-ish code was the mainstream of the language, I’d be fine with this commingling. But it’s not, and neither is it the style the language’s designers are using as their benchmark. That’s been the case for at least a decade. So, no, I don’t think C/C++ is any more justified than, I don’t know, C/C#.
Finally, the name was chosen not only very early in C++ time but actually fairly early in C time as well. When C++ was named, C didn’t even have function prototypes! (Necessarily, as it copied those from C++.) I just don’t see why it matters what the Stroustrup’s intentions were when he chose the name in 1982. A lot has changed in forty years.
The hard part, the one that has occupied JVM engineers for almost three decades now, comes afterwards: when you try to make things not freeze when memory is low, or when you have multiple threads mutating the same heap, or ultimately when you’re adapting the GC to the particulars of your language. (E.g. Haskell has an awesome concurrent GC that’d work like crap for Java, because it assumes tons of really short-lived, really small garbage and almost no mutation. The other way around is also bound to be problematic—I don’t know how the Scala people do it.)
So a GC being tracing and not refcounting is not really a useful benchmark. And Go’s GC is undeniably less advanced than OpenJDK’s, simply because almost every other GC is. It can still suit Go’s purposes, but it does mean running Java on top of it is bound to yield interesting results.
(And can we please stop pretending C and C++ are in any way close as languages? Even if the latter reuses some parts from the former’s runtime.)