Hacker News new | ask | show | jobs
by colanderman 4573 days ago
Mobile multimedia developers rely on hardware decode for codecs, Fourier transforms, and the like

Uh, no? Yes, video decode for common formats is hardware-accelerated, but I've never seen dedicated Fourier transform hardware in consumer hardware, and I can't think of any other "and the like" algorithms that are hardware accelerated not at a CPU register level.

Game Programmers will prefer a faster GPU, since none of that stuff is actually calculated on CPUs now-a-days

Mm, I think this is dubious. I agree, GPUs are better than CPUs for many multimedia applications, but getting data to and from GPUs is not fast. And of all the multimedia applications I run on my desktop (mplayer, Audacity, the Gimp, Inkscape), none currently use the GPU except for maybe mplayer for certain videos.

1 comments

DxVA passes tasks like iDCT (Inverse Discrete Cosine Transform) to the GPU on Windows. If you are running ANY DxVA codec on any Windows computer, the process happens exactly as I've described.

In fact, Intel's DxVA implementation explicitly has an iDCT accelerator. See this paper for details: http://download-software.intel.com/sites/default/files/artic...

I assume a lot of people watch Youtube on Windows computers, amirite? The iDCT is basically a Fourier Transform as far as the math is concerned. Other portions of the H264 codec (such as motion compensation) are similarly increasingly hardware-accelerated... even on crappy integrated GPUs like the old GMA950.

Phone hardware on the other hand, is basically state-of-the-art. I wouldn't be surprised if phones of today had superior hardware decoders than the crap that Intel churnned out for the bottom-barrel consumers back in 2009.

Congratulations, you have proved the tautology that hardware-accelerated frequency-domain codecs use hardware-accelerated frequency-domain transforms.

Unfortunately you entirely missed my point about everything other than video decoding. Bandwidth between the CPU and GPU quickly becomes the bottleneck, unless you're able to move most of your processing onto the GPU, which I granted you was the right thing to do. But also as I stated, none of the popular software I use actually does this. It is all optimized for CPU processing.

DxVA suffers from this same issue, i.e. you have to be very careful around moving data to & from the GPU: http://en.wikipedia.org/wiki/DXVA#DXVA2_implementations:_nat...

EDIT: And in case you think I'm talking out of my ass, I work on a high-performance embedded product. We recently switched from a 32-bit to a 64-bit version of the (ARM-like) processor we use. Nearly every single one of our major algorithms benefited from the increased register width (although we did have to slightly modify some of them to do so). And we don't even use multimedia operations. A lot of the gains come from simply moving less stuff around, which, when you have to process a packet every 40 cycles, really adds up.