|
|
|
|
|
by in3d
1491 days ago
|
|
Why wouldn’t you be able to run them in parallel using CUDA? You shouldn’t be memory-transfer speed limited when group convolution layers are a part of a bigger net. Note that pointwise 1x1 convolutions are a special case of group convolutions and actually I think they might be specially optimized in PyTorch (I’d have to run some benchmarks to test it though). |
|
You can fuse grouped convs (depthwise is a special case of grouped convs) into preceding or following layers. Maybe JAX can do this already? No clue if any library offers such an optimization out of the box