|
|
|
|
|
by stephencanon
590 days ago
|
|
Sure, but still on each tile, you do O(k^3) compute with O(k^2) memory, and you generally arrange things so that at least one tile is in L1 and at least one other is in L2/LLC (using CPU idioms), so again, you have plenty of bandwidth (typical choices of k are in the ballpark of ~32, and a 32:1 compute to memory ratio is just fine on most hardware, especially if some of those accesses are coming from fast memory) |
|