Hacker News new | ask | show | jobs
by bee_rider 969 days ago
I wonder if encode could run on the iGPU?

I think, at least, that one of the biggest use-cases for encode is game streamers (is this right?), they should have decent dGPUs anyway, so their iGPU is just sitting there.

2 comments

Elemental wrote a GPU shader h264 encoder for the Radeon 5870 back in the day, marketed towards broadcasters who needed quality and throughput: https://www.anandtech.com/show/2586

Intel used to write hybrid encoders (that used some fixed function and some iGPU shader) for their older iGPUs.

So the answer is yes... if you can fund the right group. But video encoders are hard. The kind of crack developer teams that can pull this off don't grow on trees.

Shaders have little benefit for anything with "compression" in the name. (De)compression is maximally serial/unpredictable because if any of it is predictable, it's not compressed enough.

People used to want to write them because they thought GPU=fast and shaders=GPU, but this is just evidence that almost noone knows how to write a video codec.

The Elemental encoder was supposedly quite good, but it was a different time. There was no hardware encoding, and high res realtime h264 was difficult.
That's not really true; the motion estimation stage is highly parallel. Intel's wavefront-parallel GPU motion estimation was really cool. Real world compression algorithms are nowhere close to optimal partly because it's worth trading off a little compression ratio to make the algorithm parallel.
IIRC x264 does have a lookahead motion estimation that can run on the GPU, but I wasn't sure I could explain this properly.

That said, I disagree because while motion estimation is parallel, motion coding is not because it has to be "rate distortion optimal" (depending on your quality/speed tradeoff.) So finding the best motion for a block depends on what the entropy coder state was after the last block, because it can save a lot of bits by coding inaccurate/biased motion.

That's why x264 and ffmpeg use per-frame CPU threading instead (which I wrote the decoding side of) because the entropy coder resets across frames.