|
|
|
|
|
by codedokode
49 days ago
|
|
I did little experiments with search in small arrays (16-32 items) and binary search is one of the worst methods because it requires lot of branches. The fastest method for small arrays was linear branchless search (you walk over all elements without breaking out of the loop. For example, if you want to know whether the array contains a number, you logically OR the checks for all items). I didn't use SIMD though, but the branches are very expensive for small arrays and simply checking all elements without branching is faster. |
|
That aside your findings point to the prefetcher having identified your linear searches as sequential access so practically every single access was a cache hit (you effectively measured the latency between a CPU and its L1 cache). If you wanted to test this you could do something like make each element some number of cache lines wide. Stream prefetchers have a maximum stride, so variables more than that many bytes apart won't be prefetched. Google's multichase benchmark uses 256 bytes IIRC.