Hacker News new | ask | show | jobs
by ajtulloch 4165 days ago
This paper describes one part of the fbcunn release (the fast convolution layers implemented via FFT, with the source available at https://github.com/facebook/fbcunn/tree/master/src/cuda/fft). There's a lot more in fbcunn if you want to check it out.
1 comments

In my experience, the fully connected layers are the bottleneck. The other issue was the alternating compute-heavy convolution and the IO-heavy pooling. I'm curious how this FFT implementation stacks up against cuDNN (what's the speedup like for just the convolutional layers? and then what's the overall speedup like?).
http://arxiv.org/pdf/1412.7580v2.pdf compares the convolutional implementation with the cuDNN layers. For the FC layers, it's just CuBLAS `sgemm`.