|
|
|
|
|
by remcob
2189 days ago
|
|
I tried cache-oblivious algorithms for some numerical code, but found them underwhelming. The cache-oblivious model does not consider the effects of prefetching and cache line associativity. Both can make a huge difference. They are a great starting point and work well for the RAM/Swap level of the memory hierarchy, but for the L1/L2 caches manual tuning still pays off. This unfortunately makes the implementation non-oblivious and even depended on the specific CPU used. |
|
I had some code that was slower than I intuitively expected it to be. I looked around for approaches and came across cache oblivious algorithms. In my case I was thinking of RAM as a cache and some of the inputs would fit in RAM and some were larger than RAM. All inputs and outputs were in network drives. The final output was 4.5 GB. Using cache obvious algorithms as a inspiration I managed to get my runtime down from 5 mins to 20 seconds (numbers are approximate as this was nearly ten years ago now).
I was quite pleased.