Hacker News new | ask | show | jobs
by Hydraulix989 4165 days ago
In my experience, the fully connected layers are the bottleneck. The other issue was the alternating compute-heavy convolution and the IO-heavy pooling. I'm curious how this FFT implementation stacks up against cuDNN (what's the speedup like for just the convolutional layers? and then what's the overall speedup like?).
1 comments

http://arxiv.org/pdf/1412.7580v2.pdf compares the convolutional implementation with the cuDNN layers. For the FC layers, it's just CuBLAS `sgemm`.