Hacker News new | ask | show | jobs
by shaklee3 658 days ago
I don't mean a single fft. I mean the fft algorithms are inherently not going to use the GPU at 100% utilization by any metric.
1 comments

Not so inherently IMO.

What I mean is: where did you take that from? I program FFTs on GPUs, and I see no reason for the "inherently can't reach 100% utilization by any metric".

I interpret that comment as you're not going to be using every silicon block that the GPU provides, like video codecs and rasterizing. If you've maxed out compute without going over the power budget, for example, you'd likely still be able to decode video if the GPU has a separate block for it.
I had a similar read .. I packed a lot of parallel FFT's and other processing into custom TI DSP cards but the DSP family chips were RISC and carried little 'baggage' - just fat fat 32 bit | 64 bit floating point pipelines with instruction sets optimised for modular ring indexing of scalar | vector operations.

Even then they ran @ 80% "by design" for expected hard real time usage .. they only went to 11 and dropped results in toast until they smoke tests and with operators that redlined limits (and got feedback to that effect).

I'd be curious to see how you can do it. Try launching an fft of any size and batches and see if you can hit 100%