Hacker News new | ask | show | jobs
by aidenn0 3038 days ago
On SBCL the bigger win for using a copying collector isn't the locality of reference (which helps with some loads, but hurts with others), but rather the fact that you can make an allocation be about two instructions in the non-GC case (pointer increment plus a bounds check).

I hadn't spent a lot of time thinking about how much faster this is than malloc/free until a question came up the other day here on HN to the extent of "why would anyone dynamically allocate an object that is smaller than a cache line?" In lisp a commonly allocated structure is a CONS cell which is two pointers in size, and is often smaller than the cache line. It would be very wasteful to do a malloc/free of 8 (or 16) bytes, in C but throughput is approximately identical compared to stack allocating them with SBCLs allocator.

1 comments

It never gets as fast as stack allocating variables. Consider a loop:

while(1) { Integer n = new Integer(....) vs int n = ... }

A C compiler would allocate the n variable on the stack, making the "allocation" completely free. But in a GC:ed language, the n variable would be bump allocated once every loop. That wouldn't in itself be so costly, but every so often, a GC cycle would be needed as the garbage accumulates. Furthermore, in C the address of the n variable stays in place while in a GC:ed language it moves a bit for each loop.

That is why escape analysis is a fruitful optimization. It takes heap allocated objects and attempts to stack allocate them, similar to how a C compiler would do it.

To be fair, I said "approximately" The cost of a nursery GC in SBCL is O(n+m) where n is the number of live objects in the nursery and m is the number of GC roots. If this code is all you are running then n will be 1, so it's just scanning the roots, which only happens when you have allocated enough integers to fill the nursery, so gets amortized across thousands of loop iterations.
> But in a GC:ed language, the n variable would be bump allocated once every loop.

Citation needed. I don't know of any GCed language that heap-allocates local variables in this way (well, maybe SML/NJ does, but I doubt it). Certainly not Java or any Common Lisp I've ever used.

But I agree with your larger point: heap allocation is never quite as fast as stack allocation, once you factor in the additional GC load. I don't actually know how close it gets with modern collectors; would love to see some numbers.

C# does and Java did before the escape analysis optimization became default. You can find numbers in this old article from 1999: https://www.cc.gatech.edu/~harrold/6340/cs6340_fall2009/Read...

"and the overall execution time reduction ranges from 2% to 23% (with a median of 7%) on a 333 MHz PowerPC workstation with 128 MB memory."

The benchmark is 20 years old so it is kind of out of date. I don't know of any modern benchmarks. I suspect that the difference would be much bigger nowadays because programmers don't avoid allocating small local objects as much.

Oh, now I understand what you're saying. I got thrown off when you said "the address of the variable n [...] moves a bit for each loop". This isn't quite right; it's not the address of the variable n itself that changes, it's the address of the allocated object that it points to.