|
|
|
|
|
by _yosefk
3318 days ago
|
|
Got it, an interesting point and it follows that a use case can be engineered where a GC adds essentially zero overhead over an arena allocator. But in the general case where some objects are short-lived and others aren't, surely manually splitting your allocations to malloc for long-lived ones and some sort of arena_alloc for short-lived ones ought to be faster than allocating them all in one place, then copying the ones which are still reachable out of the area reserved for short-lived objects? (This is not to say a GC-based system will be slower "on average" because nobody knows what the "average" is. A realistic arena-based system can have objects most of which are short-lived but some do need to live longer and you only find out long after they're allocated; in that case, one has to manually reallocate those objects just like a GC would, and doing it, say, the C++ way is definitely more bug-prone than GC's bug-free handling of this, and one way to make it less bug-prone in C++ is to have deeper copies and avoid trying to minimize copying, and now you might easily be slower than a GC. I'm just saying that it's very easy to find a case when a system not getting any hints from the programmer wrt object lifecycles and instead discovering them fully automatically would be slower than a system which does get these hints. And of course a GC can provide ways to supply these hints, I'm just not aware of one which does - perhaps it's avoided on the theory that the GC algorithm might change and you don't want to make hints which operate in terms not portable between algorithms a part of your interface.) |
|
This is where things get difficult, actually, and you need benchmarks because:
1. You still have a bump allocator that's much faster than a general purpose first-fit/best-fit allocator and has generally better memory locality than a pool allocator.
2. Offsetting that may be the additional tracing and/or copying you are doing as a result of garbage collection.
3. Manual memory management techniques often have their own performance costs: std::shared_ptr and std::unique_ptr both add overhead and you sometimes see additional copying where lifetimes are difficult to predict.
Which one has the higher cost is often something that can be only tested for with actual code (and can go either way).
I'll also note that this is primarily of interest for functional languages, which often have a high allocation rate. For imperative programs, the memory allocation rate (and ratio of memory that contains pointers to memory that doesn't) is often so low that even a very basic mark/sweep collector would be fine, as long as pause times don't matter in your application domain. This means that it's often not really a practical issue, one way or the other.