|
|
|
|
|
by colanderman
4573 days ago
|
|
Congratulations, you have proved the tautology that hardware-accelerated frequency-domain codecs use hardware-accelerated frequency-domain transforms. Unfortunately you entirely missed my point about everything other than video decoding. Bandwidth between the CPU and GPU quickly becomes the bottleneck, unless you're able to move most of your processing onto the GPU, which I granted you was the right thing to do. But also as I stated, none of the popular software I use actually does this. It is all optimized for CPU processing. DxVA suffers from this same issue, i.e. you have to be very careful around moving data to & from the GPU: http://en.wikipedia.org/wiki/DXVA#DXVA2_implementations:_nat... EDIT: And in case you think I'm talking out of my ass, I work on a high-performance embedded product. We recently switched from a 32-bit to a 64-bit version of the (ARM-like) processor we use. Nearly every single one of our major algorithms benefited from the increased register width (although we did have to slightly modify some of them to do so). And we don't even use multimedia operations. A lot of the gains come from simply moving less stuff around, which, when you have to process a packet every 40 cycles, really adds up. |
|