|
|
|
|
|
by 37ef_ced3
2020 days ago
|
|
Enough that the data panel of the input tensor fills the thread's share of the L2 cache, and the output tensor is of similar depth So it depends on the cache size, but you can think of it as being about 512 channels in, 512 channels out, something like that |
|