| Great question, this is a large, complicated topic, but here's a quick overview. For context I've been building live streaming video infrastructure for a little over 10 years now. At a fundamental level, yes, the encoding is one of the most expensive components of a live-streaming system (at low scale), and honestly, your guess of 2 bitrates for each video is very much on the low end - generally on average most platforms create about 5 different qualities created for any one stream, ranging from ~500kbps to ~5+mbps. If you look at the pricing of modern video platforms, you can see the high cost of ingest and transcode captured in their pricing: AWS IVS - $2.00 per input hour.
API.video - $2.40 per input hour.
Mux - $4.20 per input hour. Generally there isn't too much use of hardware acceleration on ASIC or GPU for h.264 processing today. FFmpeg (x264) is plenty "fast enough" when tuned on commodity X86 hardware, when you have things like AVX extensions. Generally transcoding a 4-6 second segment of video should only be taking a couple of hundred milliseconds at a maximum. As for how the larger platforms deal with keeping live streaming cost competitive, the number of qualities (and often codecs used) varies depending on the number of viewers you have, so that they're not wasting resources encoding for a couple of viewers. Many platforms also implement just-in-time encoding, to limit the amount of content that's transcoded when it isn't being viewed. Some platforms also drop all the way down to a transmux for streams with very low viewer numbers - transmux in this context just changes the packaging of the inbound stream without changing the actual encoded picture data. It's also worth considering from a business perspective that many UGC live platforms will also be taking a loss on small streamers with a low number of viewers, and covering that with the revenue from larger streamers with ad revenue / subscribers. I hope that helps! |
> FFmpeg (x264) is plenty "fast enough" when tuned on commodity X86 hardware, when you have things like AVX extensions. Generally transcoding a 4-6 second segment of video should only be taking a couple of hundred milliseconds at a maximum
can you elaborate on this? i find that hard to believe from what i have seen and from the ton of "ffmpeg is slow" google search results. if that is truly the case, there might be light at the end of the tunnel. annecdotally, today i encoded 2sec h264 mp4 into av1 and it took only 2-3 minutes... vp9 took 8 seconds(still 4x). sure, that was not on a server but still, it's not like i am computing PI on the background.