|
|
|
|
|
by jacobn
1491 days ago
|
|
Grouped convolutions can't really run faster than groups * conv(ch/group) and I believe that's close to where they're at? Note that for ch<O(512) (varies by GPU & hw) you tend to be memory-transfer-speed limited, not compute limited. So unfortunately depthwise convolutions end up having terrible performance. |
|
Note that pointwise 1x1 convolutions are a special case of group convolutions and actually I think they might be specially optimized in PyTorch (I’d have to run some benchmarks to test it though).