Hacker News new | ask | show | jobs
by easde 2052 days ago
It's true that depthwise convolutions are bandwidth bound, but most networks that use them in combination with a kernel size 1 "convolution". If those two operations are tiled and fused together, the result is often compute bound again. This is usually not the case in most ML frameworks and libraries though, including CuDNN.