|
|
|
|
|
by thomaskoopman
471 days ago
|
|
I think it depends more on the ratio between access time and how often you use the data. Adding two arrays that fit in L1 is already limited by access time. On Zen3, we can add two 32-byte vectors per cycle, but only store one of these per cycle. For matrix multiplication, we can do the two additions per cycle (or really c <- a * b + c) because we have to do multiple operations once we have loaded the data into registers. I can see it be useful for data sets of a few dozen MBs as well. |
|