Hacker News new | ask | show | jobs
by vlovich123 968 days ago
There’s a few that would be neat:

* maybe possible already, but it’s not immediately clear how to change the bitrate of the encoder dynamically when doing VBR/CBR (seems like you can only do it with per-frame quantization params which isn’t very friendly)

* being able to specify the reference frame to use for encoding p frames

* being able to generate slices efficiently / display them easily. For example, Oculus Link encodes 1/n of the video in parallel encoders and decodes similarly. This way your encoding time only contributes 1/n frame encode/decode worth of latency because the rest is amortized with tx+decode of other slices. I suspect the biggest requirement here is to be able to cheaply and easily get N VideoFrames OR be able to cheaply split a VideoFrame into horizontal or vertical slices.

1 comments

* Hmm, what kind of scheme are you thinking beyond per frame QP? Does an abstraction on top of QP work for the case you have in mind?

* Reference frame control seems to be https://github.com/w3c/webcodecs/issues/285, there's some interest in this for 2024, so I'd expect progress here.

* Does splitting frames in WebGPU/WebGL work for the use case here? I'm not sure we could do anything internally (we're at the mercy of hardware decode implementations) without implementing such a shader.

> what kind of scheme are you thinking beyond per frame QP

Ideally I'd like to be able to set the CBR / VBR bitrate instead of some vague QP parameter that I manually have to profile to figure out how it corresponds to a bitrate for a given encoder. Of course, maybe encoders don't actually support this? I can't recall. It's been a while.

> Does splitting frames in WebGPU/WebGL work for the use case here? I'm not sure we could do anything internally (we're at the mercy of hardware decode implementations) without implementing such a shader.

I don't think you need a shader. We did it at Oculus Link with existing HW encoders and it worked fine (at least for AMD and NVidia - not 100% sure about Intel's capabilities). It did require some bitmunging to muck with the NVidia H264 bitstream to make the parallel QCOM decoders happy with slices coming from a single encoder session* but it wasn't that significant a problem.

For video streaming, supporting a standard for Webcams to be able to deliver slices with timestampped information about the rolling shutter (+ maybe IMU for mobile use cases) would help create a market for premium low-latency webcams. You'd need to figure out how to implement just in time rolling shutter corrections on the display side to mitigate the downsides of rolling shutter but the extra IMU information would be very useful (many mobile camera display packages support this functionality). VR displays often have rolling shutter so a rolling shutter webcam + display together would really make it possible to do "just in time" corrections for where pixels end up to adjust for latency. I'm not sure how much you'd get out of that, but my hunch is that if you knock out all the details you should be able to shave off nearly a frame of latency glass to glass.

Speaking of adjustments, extracting motion vectors from the video is also useful, at least for VR, so that you can give the compositor the relevant information to apply last-minute corrections for that "locked to your motion" feeling (counteracts motion sickness).

On a related note, with HW GPU encoders, it would be nice to have the webcam frame sent from the webcam directly to the GPU instead of round-tripping into a CPU buffer that you then either transport to the GPU or encode on the CPU - this should save a few ms of latency. Think NVidia's Direct standards but extended so that the GPU can grab the frame from the webcam, encode & maybe even send it out over Ethernet directly (the Ethernet part would be particularly valuable for tech like Stadia / GeForce now). I know the HW standards for that don't actually exist yet, but it might be interesting to explore with NVidia, AMD, and Intel what HW acceleration of that data path might look like.

* NVidia's encoder supports slices directly and has an artificial limit on the number of encoder sessions on consumer drivers (they raised it in the past few years but IIRC it's still anemic). That however means that the generated slices have some incorrect parameters in the bitstream if you want to decode them independently. So you have to muck with the bitstream in a trivial way so that the decoders see independent valid H264 bitstreams they can decode. On AMD you don't have a limit to the number of encoder.

> Ideally I'd like to be able to set the CBR / VBR bitrate

What's wrong with the existing VBR/CBR modes? https://developer.mozilla.org/en-US/docs/Web/API/VideoEncode...

> I don't think you need a shader...

Ah I see what you mean. It'd probably be hard for us to standardize this in a way that worked across platforms which likely precludes us from doing anything quickly here. The stuff easiest to standardize for WebCodecs is stuff that's already standardized as part of the relevant codec spec (e.g, AVC, AV1, etc) and well supported on a significant range of hardware.

> ... instead of round-tripping into a CPU buffer

We're working on optimizing this in 2024, we do avoid CPU buffers in some cases, but not as many as we could.

> It'd probably be hard for us to standardize this in a way that worked across platforms which likely precludes us from doing anything quickly here. The stuff easiest to standardize for WebCodecs is stuff that's already standardized as part of the relevant codec spec (e.g, AVC, AV1, etc) and well supported on a significant range of hardware.

As I said, oculus link worked with off the shelf encoders. Only the Nvidia one needed some special work and even that’s not even needed anymore since they raised the number of encoders (and the amount of work was really trivial - just adjusting some header information in the h.264 framing). I think all you really need is the ability to either slice a VideoFrame into strips 0 cost and have the user feed them into separate encoders OR to request sliced encoding and under the hood that’s implemented however (either multiple encoder sessions or using Nvidia slice API if using nvenc). You can even make support for sliced encoding optional and implement it just for the backends where it’s doable.