Hacker News new | ask | show | jobs
by roetlich 544 days ago
> Also, no point in calling it “tracing garbage collection”.

You're against more explicit naming just for the sake of it? In the literature reference counting is also referred to as a type of garbage collection, and doesn't involve tracing. If you talking about a specific context you can probably drop the "tracing", but in a general article like this it would just be very confusing?

This way, someone can google "tracing garbage collection", and will find the relevant wikipedia article: https://en.wikipedia.org/wiki/Tracing_garbage_collection

1 comments

The literature does not always put the “tracing” in front of the “garbage collection”.

For example, nobody says that Objective-C is garbage collected just because it has ARC. Nobody says that C++ is garbage collected even though shared_ptr is widespread. And systems that do tracing GC just call it GC (see for example https://www.oracle.com/webfolder/technetwork/tutorials/obe/j...)

To think clearly about the tradeoff between GC and RC it’s important to acknowledge the semantic differences:

- GC definitely collects dead cycles.

- RC knows exactly when objects die, which allows for sensible destructor semantics and things like CoW.

- it’s possible to use RC as an optimization in a GC, but then you end up with GC semantics and you still have tracing (hence: if it’s got tracing, it’s a garbage collector).

It’s a recent fad to say that RC is a kind of GC, but I don’t think it ever took off outside academia. Folks who write GCs call them GCs. Folks who do shared_ptr or ARC say that they don’t use GC.

And its good if this fad dies because saying that RC is a kind of GC causes folks to overlook the massive semantic elephant in the room: if you use a GC then you can’t swap it for RC because you’d leak memory (RC would fail to delete cycles), and if you use RC and swap it for a GC then you’d leak resources (your destructors would no longer get called when you expect them to).

On the other hand, it is possible to change the guts of an RC impl without anyone noticing. And it’s possible to change the guts of a GC while preserving compatibility. So these are really two different worlds.

> The literature does not always put the “tracing” in front of the “garbage collection”.

Not always, but often enough that an introductory article that presents an overview of different memory managment techniques should maybe use the longer name to avoid confusion.

And I kinda agree with you, using the name "garbage collection" for RC doesn't really make sense, there is no metaphorical garbage truck driving around to collect your unused memory. :)

What's your opinion on the term "RC with cycle detection" that some use for things like Python's GC?

> And it’s possible to change the guts of a GC while preserving compatibility.

Moving to a conservative GC might also introduce memory leaks, if you're unlucky. But yes, "tracing" gc and rc obeviously behave very differently, and have very different performance considerations.

> Not always, but often enough that an introductory article that presents an overview of different memory managment techniques should maybe use the longer name to avoid confusion.

Referring to garbage collection as tracing garbage collection creates more confusion and should be avoided.

It confuses folks into thinking that there is some garbage collection that isn’t tracing. There’s no such thing.

> What's your opinion on the term "RC with cycle detection" that some use for things like Python's GC?

Depends on how you detect cycles. Python uses a garbage collector. Therefore I would say that python has a GC and is a GC’d language.

> Moving to a conservative GC might also introduce memory leaks, if you're unlucky.

Folks who adopt conservatism in production do so only if they have a story for avoiding those leaks. (That’s what we did in JavaScriptCore.)

> But yes, "tracing" gc and rc obeviously behave very differently, and have very different performance considerations.

Just call it “GC” and everyone will know what you mean. No need to be a contrarian and put “tracing” in front.

And it’s not perf considerations if it’s the difference between your program running at all and crashing. Failing to collect all cycles as RC does would cause a program written in a GC’d language to simply crash if it ran for more than just a short while. Failing to invoke destructors the way RC’d programs expect, which would happen if you tried to switch to GC, will cause observably different behavior in addition to possible performance issues.

> It confuses folks into thinking that there is some garbage collection that isn’t tracing. There’s no such thing.

It is standard to consider reference counting as garbage collection.

Bacon, D.F., Cheng, P. and Rajan, V.T. 2004. A unified theory of garbage collection. ACM SIGPLAN Notices. 39, 10 (Oct. 2004), 50–68. DOI: https://doi.org/10.1145/1035292.1028982

Abstract:

Tracing and reference counting are uniformly viewed as being fundamentally different approaches to garbage collection that possess very distinct performance properties. [...] Using this framework, we show that all high-performance collectors (for example, deferred reference counting and generational collection) are in fact hybrids of tracing and reference counting.

> And it’s not perf considerations if it’s the difference between your program running at all and crashing.

Yeah, that's what I meant to include by "behave very differently". I don't think we disagree on anything technical here. The problem is if you are currently googling for garbage collection you will mostly get garbage results. Here's duckduckgo to avoid my search bubble: https://duckduckgo.com/?q=garbage+collection+programming

The first result is the wikipedia article: https://en.wikipedia.org/wiki/Garbage_collection_%28computer... It's pretty bad, under "Strategies" it lists three: Tracing, Reference Counting and Escape Analysis. I'm sure these three are similar things.

The second result is this blog post, also listing rereference counting as gc: https://www.freecodecamp.org/news/a-guide-to-garbage-collect...

And the third result looks okay. Searching for "tracing garbage collection" has better results. The text in question already uses "gc" most of the time, and has a footnote saying:

> By "garbage collection", we're referring to tracing garbage collection.

I think that's as clear as it gets, without going on rant about the names of things. You are clearly an expert in garbage collectors, but most people in the target audience of that article are not. The article compares the differences between rc and gc. If someone then goes and reads the wikipedia articles about either of those they will be very confused because wikipedia will tell them rc is gc. A "fad" like this can't be undone, once a usage of a word becomes this popular you can't undo it.

Okay, sorry, this was too long, and we agree to like 99% anyway. Have a nice day! :)

That reminds me of the time I went on a first date and she asked, "so what do you do exactly?" and I said I work on "garbage collection". You should have seen her face!