|
|
|
|
|
by DaleCurtis
969 days ago
|
|
* Hmm, what kind of scheme are you thinking beyond per frame QP? Does an abstraction on top of QP work for the case you have in mind? * Reference frame control seems to be https://github.com/w3c/webcodecs/issues/285, there's some interest in this for 2024, so I'd expect progress here. * Does splitting frames in WebGPU/WebGL work for the use case here? I'm not sure we could do anything internally (we're at the mercy of hardware decode implementations) without implementing such a shader. |
|
Ideally I'd like to be able to set the CBR / VBR bitrate instead of some vague QP parameter that I manually have to profile to figure out how it corresponds to a bitrate for a given encoder. Of course, maybe encoders don't actually support this? I can't recall. It's been a while.
> Does splitting frames in WebGPU/WebGL work for the use case here? I'm not sure we could do anything internally (we're at the mercy of hardware decode implementations) without implementing such a shader.
I don't think you need a shader. We did it at Oculus Link with existing HW encoders and it worked fine (at least for AMD and NVidia - not 100% sure about Intel's capabilities). It did require some bitmunging to muck with the NVidia H264 bitstream to make the parallel QCOM decoders happy with slices coming from a single encoder session* but it wasn't that significant a problem.
For video streaming, supporting a standard for Webcams to be able to deliver slices with timestampped information about the rolling shutter (+ maybe IMU for mobile use cases) would help create a market for premium low-latency webcams. You'd need to figure out how to implement just in time rolling shutter corrections on the display side to mitigate the downsides of rolling shutter but the extra IMU information would be very useful (many mobile camera display packages support this functionality). VR displays often have rolling shutter so a rolling shutter webcam + display together would really make it possible to do "just in time" corrections for where pixels end up to adjust for latency. I'm not sure how much you'd get out of that, but my hunch is that if you knock out all the details you should be able to shave off nearly a frame of latency glass to glass.
Speaking of adjustments, extracting motion vectors from the video is also useful, at least for VR, so that you can give the compositor the relevant information to apply last-minute corrections for that "locked to your motion" feeling (counteracts motion sickness).
On a related note, with HW GPU encoders, it would be nice to have the webcam frame sent from the webcam directly to the GPU instead of round-tripping into a CPU buffer that you then either transport to the GPU or encode on the CPU - this should save a few ms of latency. Think NVidia's Direct standards but extended so that the GPU can grab the frame from the webcam, encode & maybe even send it out over Ethernet directly (the Ethernet part would be particularly valuable for tech like Stadia / GeForce now). I know the HW standards for that don't actually exist yet, but it might be interesting to explore with NVidia, AMD, and Intel what HW acceleration of that data path might look like.
* NVidia's encoder supports slices directly and has an artificial limit on the number of encoder sessions on consumer drivers (they raised it in the past few years but IIRC it's still anemic). That however means that the generated slices have some incorrect parameters in the bitstream if you want to decode them independently. So you have to muck with the bitstream in a trivial way so that the decoders see independent valid H264 bitstreams they can decode. On AMD you don't have a limit to the number of encoder.