| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bwasti 2525 days ago
	Note that this a layout trick and not an algorithmic one. An algorithmic speed up that is good for dense convolutions with small kernels is to use Winograd: https://arxiv.org/abs/1509.09308 For large kernels, implementing an FFT tends to help. Also worth keeping in mind that many modern networks use depthwise separable convolutions, which are channel wise convolutions (skipping a reduction over the channels, which is a memory bound operation) followed by 1x1 convolutions (which are exactly matrix multiplications with no im2col step).