| It doesn't matter what the time to free memory is as long as it is O(n) w.r.t. the number of objects (or less), the limiting factor is allocation. As, in steady state (as you are assuming), all memory freed will be reused, and (effectively) all memory used will be written to at least once, and memory used by a group of objects is at least O(n) w.r.t. the number of objects. So no, I don't see the advantage you say. I see that, in the best case, it's on par with non-GC'd systems, asymptotically speaking. And yet, I see plenty of disadvantages. To start: I have yet to see a GC that's O(1) w.r.t. the amount of memory to free. Best case? I'll grant you that, for an extremely naive copying collector. But the chance of that best case occurring is... not significant. Especially with the more sophisticated algorithms. And again, as above, it doesn't matter how long it takes to free memory. As long as it's linear or sublinear, it'll be dominated, asymptotically speaking, by how long it takes to initialize that memory again. And in steady-state operation it will be reinitialized. Not to mention the practical effects. For instance: GC thrashes cache. Badly. It completely kills the concept of a working set. You can't work around that by making your heap larger - if anything it makes it worse. It makes the time period between thrashings longer, yes, but makes them worse when it does happen. GC also doesn't scale well to large numbers of cores. In particular, your worst case "always" will be when the GC has to check with every core to see if it still has a reference to something. And given that processor numbers are where most of the performance improvements seem to be coming in... You mentioned memory size improvements but didn't mention increasing numbers of cores. GC also (almost always) has field(s) per object. Minimum of O(1) overhead per object initialized. Which means that, generally speaking, the overhead doesn't asymptote to zero. It asymptotes to a constant - and non-negligible, at least from what I've seen - amount of overhead. |
I'm not saying a fast GC can reduce the cost of allocation. I'm saying a fast GC can reduce GC overhead. I think that's an uncontroversial statement. I can only insist so many times that Java's GC doesn't touch dead objects at any time during its scan. If you want to disagree with me, that is your prerogative.
I agree that a GC walk can thrash cache when it happens. However, a copying GC can also rearrange object layout to improve locality. Which effect is more significant depends on the workload.
I didn't mention the number of cores because it didn't occur to me that it was significant to you. GCs can scale just fine to many cores. No core has to "check" other cores: each core can check its own local data, depending on the tactics employed to deal with NUMA. There are always challenges in scaling any parallel workload, of course, and there are always solutions.
Our GC (in IBM's J9 JVM) uses 8 bits in each object. With classes aligned to 256 bytes, there are eight free bits in each object's class pointer. Hence, J9's object model has one word of overhead on top of the user's data, which is what malloc/free typically needs to track each object's size anyway. No disadvantage there.
[These are my personal opinions, not those of my employer.]