binary-trees is almost completely dominated by the time spent in allocator code, and stresses its throughput. This benchmark showcases how big of a gap is between manual per-thread arenas, then tracing generational multi-heap GCs, then ARC and more specialized designs like Go GC. Honorable mention goes to BEAM which also showcases excellent throughput by having process-level independent GCs, in this case resembling the behavior of .NET's and OpenJDK GC implementations.
A tree is indeed a bad fit for RC; so is anything else where you have multiple references to something but know there is a single real owner.
I'd suggest keeping strong references to all the tree nodes in an array, then having everything within the tree be unowned. Basically fake arena allocation.
is a common way you see toy data structures code written, but it's inefficient (because pointer chasing is slow) and there's better patterns. If you use the arena method above, you could use indexes into the arena. If not, intrusive data structures (where the references are inside Node instead of inside Tree) are better.
Pointer chasing is irrelevant here. It takes <= 15% of the execution time, and CPUs have gotten good at it. If it takes more - it speaks more about the quality of the allocator which has poor memory locality. As noted in my previous comment, it is dominated by the time spent in the allocator/GC code.
The requirement to implement the same algorithm with the same data structure is what makes this benchmark interesting and informative. Don't tell me "allocate parent object A, allocate child objects C, D and E and assign them to A's fields, then allocate array F and G, and assign them to D's fields" isn't the bread and butter of all kinds of application logic, something that this benchmark stresses.
Some CPUs are good at it, but most aren't. (Apple's are.)
But that's not the actual issue; the issue is that pointers are big (8 bytes) and indexes are smaller, so now you can fit more in the cache. It would also help GC because it doesn't have to trace them.
Also, I don't recommend intrusive structures merely because they'd be better for language games. I think they're better in general ;)
> But that's not the actual issue; the issue is that pointers are big (8 bytes) and indexes are smaller, so now you can fit more in the cache. It would also help GC because it doesn't have to trace them.
Please read 'binary-trees' description and submission rules (#2). You are missing the point(er).
binary-trees is almost completely dominated by the time spent in allocator code, and stresses its throughput. This benchmark showcases how big of a gap is between manual per-thread arenas, then tracing generational multi-heap GCs, then ARC and more specialized designs like Go GC. Honorable mention goes to BEAM which also showcases excellent throughput by having process-level independent GCs, in this case resembling the behavior of .NET's and OpenJDK GC implementations.