| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Hydraulix989 4212 days ago
	In my experience, the fully connected layers are the bottleneck. The other issue was the alternating compute-heavy convolution and the IO-heavy pooling. I'm curious how this FFT implementation stacks up against cuDNN (what's the speedup like for just the convolutional layers? and then what's the overall speedup like?).

1 comments

http://arxiv.org/pdf/1412.7580v2.pdf compares the convolutional implementation with the cuDNN layers. For the FC layers, it's just CuBLAS `sgemm`.