| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jacobn 1491 days ago

Grouped convolutions can't really run faster than groups * conv(ch/group) and I believe that's close to where they're at?

Note that for ch<O(512) (varies by GPU & hw) you tend to be memory-transfer-speed limited, not compute limited.

So unfortunately depthwise convolutions end up having terrible performance.

1 comments

in3d 1491 days ago

Why wouldn’t you be able to run them in parallel using CUDA? You shouldn’t be memory-transfer speed limited when group convolution layers are a part of a bigger net.

Note that pointwise 1x1 convolutions are a special case of group convolutions and actually I think they might be specially optimized in PyTorch (I’d have to run some benchmarks to test it though).

link

brrrrrm 1490 days ago

pointwise isn’t a case of grouped conv, they’re orthogonal ideas.

You can fuse grouped convs (depthwise is a special case of grouped convs) into preceding or following layers. Maybe JAX can do this already? No clue if any library offers such an optimization out of the box

link

in3d 1490 days ago

Sorry, yes, I was replying to the post about depthwise convolution and that’s what I meant (though the naming of it is poor) - i.e. the special case of group convolutions where the number of groups is equal to the number of channels.

link