Hacker News new | ask | show | jobs
by tda 2320 days ago
Why is (1025, 1025, 1025) so much faster than (1024, 1024, 1024)?
1 comments

My guess that it's happening mostly due cache conflicts. With 1024 for a simplified L1 with 32kb you can fit exactly 8 lines of the inner dimmension in the cache, which means that (0,8,0) would have the same cache location as (0, 0, 0), which is bad for tiling