|
|
|
|
|
by fulafel
2681 days ago
|
|
The idea is that the Y dimension is going to have a limited nr (here 64) of hot cache lines while a tile is processed.
After going through one set of 64 vertical lines, the Y accesses are going to be near the Y accesses from the previous outer-tile-loop iteration. (Stride detecting prefetch can help, especially on the first iteration of a tile, but is not required for a speedup). BTW this is the motivation for GPUs (and sometimes other graphics applications) using "swizzled" texture/image formats, where pixels are organised into various kinds of screen-locality preserving clumps. https://fgiesen.wordpress.com/2011/01/17/texture-tiling-and-... |
|