| > A GC system with explicitly visible reference counts (and immediate freeing) with language support to make it easier to get the refcounts right [...] To be a little pedantic on the subject, such a system (reference counting and immediate freeing) is a form of automatic memory management, but it is not GC in any way. Garbage collection implies that the system leaves garbage around, which needs to be collected in some way or another. The usual approach to refcounting releases resources as soon as they are no longer required (either by free()ing immediately or by sending it to a pool of unused resources), thus doesn't leave garbage around, and doesn't need a collector thread or mechanism to. There are partial-GC implementations of refcounting, either because items are not free()d when they reach zero references, or to automatically detect reference loops which are not handled directly. I agree with Torvalds on this matter. GC as it is promoted today is a giant step that gives programmers one benefit, solving one problem, while introducing a immeasurable pile of complexity to the system creating another pile of problems that are still not fixed today. And to fix some of these problems (like speed) you have to introduce more complexity. This is my problem with GC. I like simplicity. Simplicity tends to perform well, and being simple also means it has little space for problems. Refcounting is simple and elegant, you just have to take care of reference loops, which also has another simple solution, that is weak references. I can teach a class of CS students everything they need to know to design a refcounting resource management system in one lesson. GC is the opposite: it is big, complex, and a problem that the more you try to fix it, the more complex it becomes. The original idea is simple, but nobody uses the original idea because it performs so badly. To teach the same class how to design a GC system that performs as well as we expect today, an entire semester may not be enough. |
In a way, I do as well.
GC as it is promoted today is a giant step that gives programmers one benefit, solving one problem, while introducing a immeasurable pile of complexity to the system creating another pile of problems that are still not fixed today. And to fix some of these problems (like speed) you have to introduce more complexity.
There are plenty of contexts where speed is a non-issue. In those cases, GC has been a huge win. The conceptual simplicity is the important part. The cost of the resources that would be saved with explicit and optimized memory management would be far outweighed by the resources required to implement such things.
The original idea is simple, but nobody uses the original idea because it performs so badly.
This is simply not true.
In the context of IO-bound enterprise systems, I've seen generational GC perform admirably, almost magically. As a lark, I've put infinite loops into such apps that do nothing but allocate new objects, and unless you are doing an exceptionally intense operation, you couldn't tell the difference. Properly tuned generational GC can be a truly fantastic seeming thing!
However, I will agree that the concerns Linus highlights are real, and that refcounting systems, like the one in iOS are by far better choices in many contexts.
EDIT: The above system I victimized, I only victimized in the TEST environment, but it was populated with something like 2-week old production data. The application in question is a traditional client/server desktop app used by a major energy company and had 800 active users at the time, handling millions in transactions every minute.
IDEA: If someone had an augmented ref-counting system with a runtime containing an optional cycle-detector and something like LINT but for the runtime reference graph, one would get most of the benefits of GC with the efficiency of the ref-counting system. I half expect someone to tell me that this already exists for Python.