| A quick search for "latency" in here has one little hand-wavey blurb about Mux working to optimize HLS. >Using various content delivery networks, Mux is driving HTTP Live Streaming (HLS) latency down to the lowest levels possible levels, and partnering with the best services at every mile of delivery is crucial in supporting this continued goal. In my experience, HLS and even LLHLS are a nightmare for latency. I jokingly call it "High Latency Streaming", since it seems very hard to (reliably) obtain glass-to-glass latency in the LL range (under 4 seconds). Usually Latency with cloud streaming gets to at least 30+s. I've dabbled with implementing WebRTC solutions to obtain Ultra Low Latency (<1s) delivery but that is even more complicated and fragmented with all of the browsers vying for standardization. The solution I've cooked up in the lab with mediasoup requires an FFMPEG shim to convert from MPEGTS/h264 via UDP/SRT to MKV/YP9 via RTP, which of course drives up the latency. Mediasoup has a ton of opinionated quirks for RTP ingest too, of course. Still I've been able to prove out 400ms "glass-to-glass" which has been fun. I wonder if Mux or really anyone has intentions to deliver scalable, on cloud or on prem solutions to fill the web-native LL/Ultra LL void left by the death of flash. I'm aware of some niche solutions like Softvelum's nimble streamer, but I hate their business model and I don't know anything about their scalability. |
The trick, which maybe you don't want to do in production, is to mux the video on a per-client basis. Every wss-server gets the same H.264 elementary stream with occasional IDRs, the process links with libavformat (or knows how to produce an MP4 frame for an H.264 NAL), and each client receives essentially the same sequence of H.264 NALs but in a MP4 container made just for it, with (very occasional) skipped frames so the server can limit the client-side buffer.
When the client joins, the server starts sending the video starting with the next IDR. The client runs a JavaScript function on a timer that occasionally reports its sourceBuffer duration back to the server via the same WebSocket. If the server is unhappy that the client-side buffer remains too long (e.g. minimum sourceBuffer duration remains over 150 ms for an extended period of time, and we haven't skipped any frames in a while), it just doesn't write the last frame before the IDR into the MP4 and, from an MP4 timestamping perspective, it's like that frame never happened and nothing is missing. At 60 fps and only doing it occasionally this is not easily noticeable, and each frame skip reduces the buffer by about 17 ms. We do the same for the Opus audio (without worrying about IDRs).
In our experience, you can use this to reliably trim the client-side buffer to <70 ms if that's where you want to fall on the latency-vs.-stall tradeoff curve, and the CPU overhead of muxing on a per-client basis is in the noise, but obviously not something today's CDNs will do for you by default. Maybe it's even possible to skip the per-client muxing and just surgically omit the MP4 frame before an IDR (which would lead to a timestamp glitch, but maybe that's ok?), but we haven't tried this. You also want to make sure to go through the (undocumented) hoops to put Chrome's MP4 demuxer in "low delay mode": see https://source.chromium.org/chromium/chromium/src/+/main:med... and https://source.chromium.org/chromium/chromium/src/+/main:med...
We're using the WebSocket technique "in production" at https://puffer.stanford.edu, but without the frame skipping since there we're trying to keep the client's buffer closer to 15 seconds. We've only used the frame-skipping and per-client MP4 muxing in more limited settings (https://taps.stanford.edu/stagecast/, https://stagecast.stanford.edu/) but it worked great when we did. Happy to talk more if anybody is interested.
[If you want lower than 150 ms, I think you're looking at WebRTC/Zoom/FaceTime/other UDP-based techniques (e.g., https://snr.stanford.edu/salsify/), but realistically you start to bump up against capture and display latencies. From a UVC webcam, I don't think we've been able to get an image to the host faster than ~50 ms from start-of-exposure, even capturing at 120 fps with a short exposure time.]