|
|
|
|
|
by jnordwick
2479 days ago
|
|
He's going to have cache issues since he requires one extra memory lookup. His lookup should be roughly twice as expensive as a regular array when cache is cold - it will be dominated by the cache misses two vs one. With a hot cache, the array lookup becomes a single load op of a few cycles, and even if he gets two cache hits, his will probably be about 6-10 ops with 2 loads, a div, and a few more ops to get the index of the sub bucket. On one cache hit and a possible miss on the other (eg his top level index is in cache, but the bucket might not be), he's going to be getting high variance in his lookup timings.
OP: to help the iterator access (sum), you might be able to prefetch (eg force a read) of the folling bucket before you need it. when you move over the bucket 2, cause a fetch of bucket 3 and the same time. You might be able to hide some of that cache miss latency then. Also, with the DEQs you are potentially starting at the middle of a slab and having to roll around its front. That isn't going to prefetch well, so you might want to fetch the next next slab's base address and the address it logically starts at since those cane be different. Just try to hide as much cache miss latency as you can because that is where you are probably getting killed on in the iterator access. |
|