Hacker News new | ask | show | jobs
by GeneticGenesis 2100 days ago
Great question, this is a large, complicated topic, but here's a quick overview. For context I've been building live streaming video infrastructure for a little over 10 years now.

At a fundamental level, yes, the encoding is one of the most expensive components of a live-streaming system (at low scale), and honestly, your guess of 2 bitrates for each video is very much on the low end - generally on average most platforms create about 5 different qualities created for any one stream, ranging from ~500kbps to ~5+mbps.

If you look at the pricing of modern video platforms, you can see the high cost of ingest and transcode captured in their pricing:

AWS IVS - $2.00 per input hour. API.video - $2.40 per input hour. Mux - $4.20 per input hour.

Generally there isn't too much use of hardware acceleration on ASIC or GPU for h.264 processing today. FFmpeg (x264) is plenty "fast enough" when tuned on commodity X86 hardware, when you have things like AVX extensions. Generally transcoding a 4-6 second segment of video should only be taking a couple of hundred milliseconds at a maximum.

As for how the larger platforms deal with keeping live streaming cost competitive, the number of qualities (and often codecs used) varies depending on the number of viewers you have, so that they're not wasting resources encoding for a couple of viewers. Many platforms also implement just-in-time encoding, to limit the amount of content that's transcoded when it isn't being viewed.

Some platforms also drop all the way down to a transmux for streams with very low viewer numbers - transmux in this context just changes the packaging of the inbound stream without changing the actual encoded picture data.

It's also worth considering from a business perspective that many UGC live platforms will also be taking a loss on small streamers with a low number of viewers, and covering that with the revenue from larger streamers with ad revenue / subscribers.

I hope that helps!

1 comments

thanks for the info. i did not say there are two streams. i said there are two levels of transcoding - normalizing the ingress into some standard codec and container and second one is the lower bitrate variant - that does not mean just one but as you said - it is/can be 240p, 360p, 480p, 720p, 1080p(source) and so on. in other words the encoding takes place at least two times.

> FFmpeg (x264) is plenty "fast enough" when tuned on commodity X86 hardware, when you have things like AVX extensions. Generally transcoding a 4-6 second segment of video should only be taking a couple of hundred milliseconds at a maximum

can you elaborate on this? i find that hard to believe from what i have seen and from the ton of "ffmpeg is slow" google search results. if that is truly the case, there might be light at the end of the tunnel. annecdotally, today i encoded 2sec h264 mp4 into av1 and it took only 2-3 minutes... vp9 took 8 seconds(still 4x). sure, that was not on a server but still, it's not like i am computing PI on the background.

Sure, but this was my point, x264 is very fast on commodity hardware, the overwhelming amount of video delivered is still h.264, especially on live video, and in particular for low-viewership, live user generated content.

If you want/need to transcode to VP9 or AV1 at scale with good quality, yes, you'll absolutely need GPU or ASIC accelerated encoders, of which there are a couple for VP9, and none for AV1 today.

h.264 will get you to 95-99% market penetration on the device landscape.

ok, i see what you mean. though vp9 usually gets me 1/4 or 1/5 of h264 video so i thought it was worth to save the network traffic(even av1 got me larger file at incredibly slow speed) but as you directly pointed out, without direct support on the hardware, like h264 has these days, it will become a bottleneck so i would have to sacrifice traffic for speed and practicality and i guess vp9/av1 makes sense for youtube, bitchute or simply video HOSTING service, not for lvie streaming. thanks for lighting the bulb above my head :D
No worries!

While some of the larger UGC platforms do use VP9 for live streaming, it is only for a limited subset of high concurrent viewer streams, and yes, in many cases those aren't running the encodes on commodity hardware.

As for AV1, there really aren't any live implementations ready right now, a few have been demo'd, but I'm not aware of any deployments today.

I have checked [1], so the VP9 already has generally available hardware support, and Intel just a few days ago releasted their Tiger CPUs that do support AV1 [2].

[1] https://en.wikipedia.org/wiki/VP9#Hardware_implementations

[2] https://en.wikipedia.org/wiki/AV1#Hardware