|
|
|
|
|
by bwasti
2479 days ago
|
|
Note that this a layout trick and not an algorithmic one. An algorithmic speed up that is good for dense convolutions with small kernels is to use Winograd: https://arxiv.org/abs/1509.09308
For large kernels, implementing an FFT tends to help. Also worth keeping in mind that many modern networks use depthwise separable convolutions, which are channel wise convolutions (skipping a reduction over the channels, which is a memory bound operation) followed by 1x1 convolutions (which are exactly matrix multiplications with no im2col step). |
|