|
|
|
|
|
by dragontamer
4573 days ago
|
|
DxVA passes tasks like iDCT (Inverse Discrete Cosine Transform) to the GPU on Windows. If you are running ANY DxVA codec on any Windows computer, the process happens exactly as I've described. In fact, Intel's DxVA implementation explicitly has an iDCT accelerator. See this paper for details: http://download-software.intel.com/sites/default/files/artic... I assume a lot of people watch Youtube on Windows computers, amirite? The iDCT is basically a Fourier Transform as far as the math is concerned. Other portions of the H264 codec (such as motion compensation) are similarly increasingly hardware-accelerated... even on crappy integrated GPUs like the old GMA950. Phone hardware on the other hand, is basically state-of-the-art. I wouldn't be surprised if phones of today had superior hardware decoders than the crap that Intel churnned out for the bottom-barrel consumers back in 2009. |
|
Unfortunately you entirely missed my point about everything other than video decoding. Bandwidth between the CPU and GPU quickly becomes the bottleneck, unless you're able to move most of your processing onto the GPU, which I granted you was the right thing to do. But also as I stated, none of the popular software I use actually does this. It is all optimized for CPU processing.
DxVA suffers from this same issue, i.e. you have to be very careful around moving data to & from the GPU: http://en.wikipedia.org/wiki/DXVA#DXVA2_implementations:_nat...
EDIT: And in case you think I'm talking out of my ass, I work on a high-performance embedded product. We recently switched from a 32-bit to a 64-bit version of the (ARM-like) processor we use. Nearly every single one of our major algorithms benefited from the increased register width (although we did have to slightly modify some of them to do so). And we don't even use multimedia operations. A lot of the gains come from simply moving less stuff around, which, when you have to process a packet every 40 cycles, really adds up.