| Hi Author. I did not say that, but I did think it. I understand if it is confusing you have no idea if the person giving the admonishment "don't do this" is just sniffing glue, so I will try to explain why I think most people probably should do this. Firstly, I think the graph in your article shows no advantage except with a cold cache. I agree with this, and think we probably disagree on how likely a "cold cache" is, or how to model that when thinking about a program. I argue you should never bother. In a game (or other realtime scenario that runs continuously) you understand that every "n" frames you are going to spill cache "m" times, and that because the game runs continuously "m/n" averages out on some value. I have worked code that aims for m=0, and it's a little bit like superconductivity in some respects: strange things happen with your code when it looks like that, but remember that L1 is only 64kb so if your program gets any bigger than that or ever does a system call and jumps into the kernel (which is definitely bigger than that) then m= at least 1. Microbenchmarks tend to be m=0 which makes working with them tricky. Real programs are usually at least m=1 and usually much much bigger: m=1 is still something like 100GB (gigabytes) per second. Now through all that, it is tempting to try and think of m as the sum of latencies, but it's important to remember that the size of the code matters too: Those instructions have to be fetched from memory just like data does, so if your program gets too big (like can happen if you naively unroll every lookup table with its own function) m is going to get bigger faster than anything else. Look at objdump. Look at how big your program gets doing this. How many multiples of 64KB have you eaten with this trick? Each one of those is another m. That is why this trick is almost certainly never worth it, and if you ever find a big codebase that benefits from a few strategic hoists of this technique I bet putting a prefetch intrinsic in the right place would help better. |
https://youtu.be/i5MAXAxp_Tw