Hacker News new | ask | show | jobs
by jampekka 968 days ago
I'm currently working with WebCodecs to get (the long awaited) frame-by-frame seeking and reverse playback working in the browser. And it even seems to work, albeit the VideoDecoder queuing logic seems to give some grief for this. Any tips on figuring out how many chunks have to be queued for a specific VideoFrame to pop out?

An aside: to work with video/container files, be sure to check the libav.js project that can be used to demux streams (WebCodecs don't do this) and even used as a polyfill decoder for browsers without WebCodec support!

https://github.com/Yahweasel/libav.js/

2 comments

The amount of frames necessary is going to depend on the codec and bitstream parameters. If it's H264 or H265, there's some more discussion and links here: https://github.com/w3c/webcodecs/issues/698#issuecomment-161...

The optimizeForLatency parameter may also help in some cases: https://developer.mozilla.org/en-US/docs/Web/API/VideoDecode...

Thanks. I appreciate that making an API that can be implemented with the wide variety of decoding implementations is not an easy task.

But to be specific, this is a bit problematic with I-frames only videos too, and with optimizeForLatency enabled (that does make the queue shorter). I can of course .flush() to get the frames out but this is too slow for smooth playback.

I think I could just keep pushing chunks until I see the frame I want coming out but it will have to be done in an async "busy loop" which feels a bit nasty. But this is done also in the "official" examples I think.

Something like "enqueue" event (similarly to dequeue) that more chunks after last .decode() are needed to saturate the decoder would allow for a clean implementation. Don't know if this is possible with all backends though.

Often Chrome doesn't know when more frames are needed either, so it's not something we could add an API for unfortunately.

Yes, just feeding inputs 1 by 1 for each dequeue event until you get the number of outputs you want in your steady state is the best way. It minimizes memory usage. I'll see about updating the MDN documentation to state this better.

Wow, great to see some work in this space. I've been wanting to do reverse playback, frame accurate seek and step by step forward and back rendering in the browser for esports game analysis. The regular video tag gets you somewhat of the way there but navigating frame by frame will sometimes jump an extra frame. Likewise trying to stop at an exact point will often be 1 or 2 frames off where you should be. Firefox is much worse, when pausing at a time you could +-12 frames where you should be.

I must find some time to dig into this, thanks for sharing it.

I have it working with WebCodecs, but currently i-frames only videos and all the decoded frames are read to memory. Not impossible to lift these restrictions, but the current WebCodec API will likely make it a bit brittle (and/or janky). For my current case this is not a big problem so I haven't fought with it too much.

Figuring out libav.js demuxing may be a bit of a challenge, even though the API is quite nice as traditional AV APIs go. I'll put out my small wrapper for these in a few days.

Edit: to be clear I don't have anything to do with libav.js other than happening to find it and using it to scratch my itch. Most demuxing examples for WebCodecs use mp4box.js which really makes one a bit uncomfortably intimate with guts of the MP4 format.