|
|
|
|
|
by exyi
445 days ago
|
|
I intentionally stayed in the small integer range to avoid benchmarking the cache. 256 distinct values should fit into L1 just fine in both cases. I'm now thinking that the difference might be even larger if we instead avoid small integers and let the CPU get stuck chasing pointers. The idea is that it gets stuck on a memory access, which forces it to speculate much further, which in turn makes it backtrack a longer path if a branch was mispredicted. I'm obviously no expert on this, feel free to correct me The results for 1B range instead of 255 are 17.6 ms for unsorted / 68.2 ms for sorted! We are back to what the original article observed and it's a way stronger effect than what branch prediction can offer. So don't sort your arrays, keep them in the order the boxed values were allocated ;) |
|
Larger range one being slower unsorted yes makes sense because of allocation order no longer matching the iteration order.