| There have been a lot of posts related to garbage collection lately but none of them touched upon what I see as a crucial issue: why is garbage collection needed to begin with? Could you do without it? What is the key point that made it necessary? I'm aware of it being introduced by McCarthy in the original Lisp paper in 1960 of course. But I suspect what McCarthy originally meant is not what garbage collection turned out to be. What I suspect he meant was that there needs to be a way for memory to be managed. malloc/free offer a way for memory to be managed, and presumably they weren't invented until C was nine years later. What McCarthy might have meant is what became malloc/free in C, which doesn't need garbage collection. C isn't the only flag here. Was there any OS on the IBM 704 used to implement the original Lisp? Did the OS support multiprocessing? Because if it didn't (UNIX wasn't invented until 1969 either) it would make sense for memory to be available for a single process. And it would mean when people said garbage collection they were envisioning malloc/free. (Also, databases and operating systems can live without whatever makes garbage collection necessary, since they don't use it, and those are pretty complex and fast pieces of software.) So, what makes garbage collection different than malloc/free, and why is it necessary? I'd love to learn more about that. |
Well, not really. Truly manual memory management is getting an enormous block of memory from the OS, and fully manually choosing what goes where within that block of memory. This is indeed a real technique used when the situation is dire enough. If you're not doing that, you're deferring something to your automated solution, the only question is, how much?
malloc/free is a combination that still gives you lots of power, but it's not perfect when you start pushing the abstraction hard enough. All malloc/free combinations have some sort of pathological case, where whatever allocation strategy they are choosing is wrong for some reason. That's why there isn't just one implementation, there's many, and some applications can see big wins switching, while others may see losses.
Garbage collection isn't really some sort of binary dichotomous decision vs. malloc/free; both are forms of automated memory management. Garbage collection techniques just step it up, and try to infer when you are done with memory. The disadvantage is, they may not be as smart as a human, the advantage is that they're a heck of a lot more consistent and pay much better attention. Then, even within "garbage collection", you've got a gradient; you may see a language like Rust with mostly deterministic memory management, but with an easy call-out to GC if you need it. You may see a language like Perl, with relatively consistent finalization as soon as an unshared reference leaves a scope, or you may see something that only collects during sweeps. At the far end you get imprecise garbage collectors, such as those used in Go right now (though they are working on it, and have already made a lot of progress), so even within the realm of GC there's a range of precision vs. speed vs. nuances the programmer needs to know to use them.
GC is necessary because one particular point on this large spectrum isn't the right choice for every program. It isn't even the maximally manual solution. In fact, there's even some middle grounds between malloc/free and fully manual memory management, such as arena allocation. There's a lot of fine gradation on this scale.
"(Also, databases and operating systems can live without whatever makes garbage collection necessary, since they don't use it, and those are pretty complex and fast pieces of software.)"
A bold statement. Have you ever heard the term "compaction" used within the context of a database? That's a form of garbage collection. Contrary to what you said, almost all databases have some form of garbage collection. (Probably all the ones you're thinking about.)
As for whether operating systems have garbage collection, it depends on your precise definition of "operating system". Kernels may lack it, but as critical as they are, and as much sheer staggering code they may have for drivers and stuff, conceptually they actually aren't that complicated, compared to a lot of userland things. Broaden your definition and you'll find some form of garbage collection appear again. And if you include "reference counting" approaches as a form of garbage collection, the Linux kernel uses that all over the place. Is reference counting a form of garbage collection? Well, it can actually be a tough call... because it's all on a continuum.