|
|
|
|
|
by KeplerBoy
590 days ago
|
|
Sure, but rooflines don't account for stuff like memory granularity. You not only have to do a lot of bytes per flop to achieve the necessary arithmetic intensity, you also have to access those bytes in a coalesced way. I.e., you want to access consecutive bytes, which are ideally already in registers. |
|