|
|
|
|
|
by phkahler
1983 days ago
|
|
>> My understanding, based on the article, is that a normal processor, we would have expected arr[idx] + arr[idx+1] and arr[idx] to take the same amount of time. That depends. If the two accesses are on the same cache line, then yes. But since idx is random that will not happen sometimes. He never says how big array[] is in elements or what size each element is. I thought DRAM also had the ability to stream out consecutive addresses. If so then it looks like Apple could be missing out here. Then again, if his array fits in cache he's just measuring instruction counts. His random indexes need to cover that whole range too. There's not enough info to figure out what's going on. |
|
If you only look at the article this is true. However, the source code is freely available: https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/...