|
|
|
|
|
by poorlyknit
632 days ago
|
|
The fact that it happens 1) at/around powers of two
2) regardless whether using GC or preallocating
makes me think its got to do with cache sizes. For example, the M2 macbook has around 16MiB of cache, which gives you approximately 2 million ~ 2^20 nodes where each node has a 64 bit number and a 64 bit pointer.The jumps in your measurements could be related to the cache hierarchy. On Linux (probably macOS too) you should be able to run "perf stat ./mergesort" to show you the hitrate of your CPU's caches. I'd be interested in a follow-up post :) |
|
The linked list nodes are 16 bytes, and both 8-byte fields (.Next and .Data) get read and written during sorting. Up to a point, larger node sizes,which I would thing would change caching, don't change where the performance drops occur: https://bruceediger.com/posts/mergesort-investigation-7/
Sorting an already sorted, or reverse sorted list, doesn't exhibit the performance drops either: https://bruceediger.com/posts/mergesort-investigation-9/
A purely recursive mergesort doesn't show the performance drops: https://bruceediger.com/posts/mergesort-investigation-3/