|
|
|
|
|
by touisteur
553 days ago
|
|
You can mostly compute it for yourself or get an asymptotic feeling. Matmul is N^3 even with tensor cores eating a large part of it (with diminished precision, but ok) and FFT-based convolution is NlogN mostly. At some point - not that high - even the available TFLOPS available on tensor cores can't keep up. I was surprised by how far cuDNN will take you, but at some point the curves cross and matmul-based convolution time keeps growing at polynomial rate and FFT-based mostly linear (until it gets memory bound) and even that can be slashed down with block-based FFT schemes. Look up the thesis I posted earlier there's an introduction in it. |
|
are we talking about matmul or conv? because no one, today, is using matmul for conv (at least not gemm - igemm isn't the same thing).
like i said - the arms race has been going for ~10 years and everyone already discovered divide-conquer and twiddle factors a long time ago. if you do `nm -C libcudnn.so` you will see lots of mentions of winograd and it's not because the cudnn team likes grapes.
> Look up the thesis I posted earlier there's an introduction in it.
there are a billion theses on this. including mine. there are infinte tricks/approaches/methods for different platforms. that's why saying something like "fft is best" is silly.