| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mmcclure 2171 days ago

This definitely isn't ignorance, it's a very, very common question. The TL;DR on it is: cost.

The most cost-effective way of delivering video is using some form of HTTP streaming (like HLS or DASH). In a nutshell, the player downloads a manifest that tells it where to find chunks of video, which are downloaded and cached in normal, commodity CDNs. Everything is stateless and is scaled like any other form of HTTP download. To do all of this you end up needing to transcode the incoming stream. All along the way through this process introduces latency, and for reference 20 seconds is perfectly normal HLS latency, so credit where credit's due, this is really impressive. One of my colleagues wrote about the state of low latency last year[1], and considered < 4 seconds to be "ultra low latency"...that's really rare, particularly among platforms.

You can get lower latency, of course, but typically that involves stateful connections. All of that cheap commodity stuff that comes with HTTP streaming above goes away, and scaling to a large number of viewers can get extremely costly (and operationally difficult).

[1] https://mux.com/blog/the-low-latency-live-streaming-landscap...

3 comments

kwindla 2171 days ago

mmcclure is way too polite to say so, but Amazon IVS is definitely not going to give you 2s latency.

Currently, IVS configured for "ultra low latency" is using HLS segments that are three seconds long. The client tries not to buffer more than one segment, so on a good network connection you'll see ~4 seconds of latency.

In theory, you could start playing the video while you're still downloading the first segment. That's how you'd get ~2s of latency. But the AWS player doesn't actually do that. And for good reason. These are TCP connections, so if there's any packet loss at all, you'll have to either buffer or skip the segment and change bitrates. Starting the video and then immediately buffering is a pretty poor user experience.

This is pretty easy to test. I just did, twice: streaming from OBS on my desktop and then directly from our compositing servers in the cloud. In both cases the latency was ~4 seconds.

link

jkarneges 2171 days ago

> but typically that involves stateful connections [...] and scaling to a large number of viewers can get extremely costly (and operationally difficult)

This is why we built Pushpin, to make it easier to handle stateful connections at scale. The project is mostly intended for moving application data, but it does work for media streaming too. See our live MP3 demo [1]. The backend runs GStreamer in a loop to produce the audio, and has no awareness of client connections. Pushpin moves the bytes and knows nothing about audio or codecs.

[1] http://audiostream.fanoutapp.com/

link

ponker 2171 days ago

You mean all those Twitch streamers see their chats More than 4 seconds after the relevant video has elapsed? And can you explain what you mean by “stateful”? Thanks

link

mmcclure 2171 days ago

> all those Twitch streamers see their chats More than 4 seconds after the relevant video has elapsed

Yep! I think Twitch can sometimes be at 4 seconds or less for what it's worth, but yes, that delay is real. It's generally not really noticeable because the communication is totally async; the streamer is doing other stuff, finishing other thoughts, etc, then can get to messages as they see them.

> can you explain what you mean by “stateful”?

Sure! I was talking about what kind of connection is necessary to deliver the video. A lot of real-time video solutions require stateful connections to the client, which means that once the client connects, the connection is kept open, and video data is streamed via that connection. Common examples on the web are things like WebSockets and WebRTC, but it gets really expensive because you basically have to maintain a persistent connection with every single viewer and it makes it impossible (or extremely difficult) to do any kind of meaningful caching.

Stateless connections on the other hand are most of your common HTTP requests. The client asks for a resource, and gets it back. No prior connection setup is required, servers, networks, whatever can change between each request and everything will merrily chug along, which makes it much easier to scale.

link

ponker 2171 days ago

Huh, thanks, I think I’m starting to get it. Is it that the connection-based model needs the data “copied” into each user’s stream on the server-side while the “stash the files in a bin” stateless model allows the networking hardware to cache this data somewhere in memory and just copy it on the fly onto different network links?

link

sudhirj 2170 days ago

Stashing files–let's say that each file is one second long and is just numbered with the unix timestamp – makes things cost effective because now the server is just dropping files no 1,2,3,4... into a directory, and everyone is pulling them out in sequence the way an other file would be downloaded. This also allows exploiting the HTTP architecture – if you set the Cache-Control: public headers on the files (which you can do happily because they'll never change) they'll be cached at lots of places along the way, like CDNs, the local ISP, the office network, etc. HTTPS blew out most of these caching benefits, but at least the CDNs can cache the files at edges all over the world.

link

donavanm 2171 days ago

Its not just that it can be cached, but that it can use very standard existing infrastructure like HTTP CDNs, mobile browsers, etc. The limitation is that the audio/video is encoded as segments, each a few seconds long. Because of this it looks kind of like serial batch processing with latency constraints based on the batch size (segment duration). This is in contrast to say webrtc or rtmp thats a lot closer to a multiplexed stream of data.

link

ec109685 2170 days ago

HTTP connections are stateful too. Clients aren’t doing tls handshakes every request, for instance.

link

jorams 2171 days ago

It's important to note here that the 4 seconds of latency here is a pretty recent development. 30 seconds used te be the norm.

link