Hacker News new | ask | show | jobs
by tbalsam 1985 days ago
MV2 is memory-limited, the depthwise + groups + 1x1 convs has a long launch time on GPU. Shattered kernels are fine for CPU, but not for GPU.

Though per your note on the scales, that's really interesting empirical results. I'll have to look into that, thanks for passing that along.