Hacker News new | ask | show | jobs
by saagarjha 589 days ago
I don't think so? It is too late for me to actually do the math on this but if you take the degenerate case where the tile size is literally 1 element then you will do as many loads as arithmetic operations. Thus I would consider any sort of fixed tiling (which you would be resigned to due to your caches being of limited size) requiring O(n^3) loads?