>> If it wasn’t, we couldn’t stream video without loading the entire file first
I don't believe this is correct. To my knowledge, video stream requests chunks by range and is largely client controlled. It isn't a single, long lived http connection.
Yes, the statement is patently wrong. There are a few very popular video formats whose main feature is chunking through HTTP, like HTTP Live Streaming or MPEG-DASH.
I believe that's standard for Netflix, etc, but is it also true for plain webms and mp4s in a <video> tags? I thought those were downloaded in one request but had enough metadata at the beginning to allow playback to start before the file is completely downloaded.
Browsers talking to static web servers use HTTP byte ranges requests to get chunks of videos and can use the same mechanism to seek to any point in the file.
Streaming that way is fast and simple. No fancy technology required.
For MP4 to work that we you need to render it as fragmented MP4.
Why would the browser send byte range requests for video tags if it expects to play the file back linearly from beginning to end anyway? Wouldn't that be additional overhead/round-trips?
My original comment was about the commenter I replied to saying:
> To my knowledge, video stream requests chunks by range and is largely client controlled. It isn't a single, long lived http connection.
Wouldn't a byte range request for the whole file fall under the "single, long lived http connection"? Sure it could be terminated early and another request made for seeking, but regardless the video can start before the whole file is downloaded, assuming it's encoded correctly?
> Wouldn't a byte range request for the whole file fall under the "single, long lived http connection"?
Yes, it would (though a better description would be "a single, long lived http request" because this doesn't have anything to do with connections), and wewewedxfgdf also replied Yes.
> Sure it could be terminated early and another request made for seeking, but regardless the video can start before the whole file is downloaded, assuming it's encoded correctly?
Seconded, ive done a userland 'content-range' implementation myself. of course there were a few ffmpeg specific parameters the mp4 needed to work right still
It’s not true because throwing a video file as a source on video tag has no information about the file being requested until the headers are pushed down. Hell back in 2005 Akamai didn’t even support byte range headers for partial content delivery, which made resuming videos impossible, I believe they pushed out the update across their network in 06 or 07.
If your HTTP server provides and supports the appropriate headers and you’re serving supported file types, then it absolutely is true.
Just putting a url in my Chromium based browser’s address bar to an mp4 file we have hosted on CloudFlare R2 “just works” (I expect a video tag would be the same), supporting skipping ahead in the video without having to download the whole thing.
Initially skipping ahead didn’t work until I disabled caching on CloudFlare CDN as that breaks the “accept-range” capability on videos. For now we have negligible amount of viewership of these mp4s, but if it becomes an issue we’ll use CloudFlare’s video serving product.
> If your HTTP server provides and supports the appropriate headers and you’re serving supported file types, then it absolutely is true.
No. When you play a file in the browser with a video tag. It requests the file. It doesn’t ask for a range. It does use the range if you seek it, or you write the JavaScript to fetch based on a range. That’s why if you press play and pause it buffers the whole video. Only if you write the code yourself can you partially buffer a while like YouTube does.
Nah, it uses complex video specific logic and http range requests as protocol. (At least the normal browsers and servers. You can roll your own dumb client/server of course.)
> That’s why if you press play and pause it buffers the whole video.
Obviously it doesn’t initially ask for a range if it starts from the beginning of the video, but it starts playing video immediately without requiring the whole file to download, when you seek it cancels the current request and then does a range request. At no point does it “have” to cache the entire file.
I suppose if you watch it from start to finish without seeking it might cache the entire file, but it may alternatively keep a limited amount cached of the video and if you go back to an earlier time it may need to re-request that part.
Your confidence seems very high on something which more than one person has corrected you on now, perhaps you need to reassess the current state of video serving, keeping in mind it does require HTTP servers to allow range requests.
You can also watch it happen - the Chrome developer tools network tab will show you the traffic that goes to and from the web browser to the server and you can see this process in action.
Who cares what happened in 2005? This is so rare nowadays, I've only really seen it on websites that are constructing the file as they go, such as the Github zip download feature.
2005 is basically the dark ages of the web. It’s pre Ajax and ie6 was the dominant browser. Using this as an argument is like saying apps aren’t suitable because the iPhone didn’t have an App Store until 2008.
> It’s not true because throwing a video file as a source on video tag has no information about the file being requested until the headers are pushed down.
And yet, if you stick a web server in front of a video and load it in chrome, you’ll see just that happening.
<video controls>
<source src="/video/sample.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
into a html file, and run it against this pastebin [0], you'll see that chrome (and safari) both do range requests out of the box if the fileis big enough.
They can playback as loading as long as they are encoded correctly fwiw (faststart encoded).
When you create a video from a device the header is actually at the end of the file. Understandable, it’s where the file pointer was and mp4 allows this so your recording device writes it at the end. You must re-encoded with faststart (puts the moov atom at the start) to make it load reasonably on a webpage though.
> Understandable, it’s where the file pointer was and mp4 allows this so your recording device writes it at the end.
Yet formats like WAVE which use a similar "chunked" encoding they just use a fixed length header and use a single seek() to get back to it when finalizing the file. Quicktime and WAVE were released around nearly the same time in the early 90s.
MP2 was so much better I cringe every time I have to deal with MP4 in some context.
At the expense of quite some overhead though, right?
MPEG-2 transport streams seem more optimized for a broadcast context, with their small frame structure and everything – as far as I know, framing overhead is at least 2%, and is arguably not needed when delivered over a reliable unicast pipe such as TCP.
Still, being able to essentially chop a single, progressively written MPEG TS file into various chunks via HTTP range requests or very simple file copy operations without having to do more than count bytes, and with self-synchronization if things go wrong, is undoubtedly nicer to work with than MP4 objects. I suppose that's why HLS started out with transport streams and only gained fMP4 support later on.
> and is arguably not needed when delivered over a reliable unicast pipe such as TCP.
So much content ended up being delivered this way, but there was a brief moment where we thought multicast UDP would be much more prevalent than it ended up being. In that context it's perfect.
> why HLS started out with transport streams and only gained fMP4 support later on.
Which I actually think was the motivation to add fMP4 to base MP4 in the first place. In any case I think MPEG also did a better job with DASH technically but borked it all up with patents. They were really stupid with that in the early 2010s.
Multicast UDP is widely used - but not on the Internet.
We often forget there are networks other than the Internet. Understandable, since the Internet is most open. The Internet is just an overlay network over ISPs' private networks.
SCTP is used in cellphone networks and the interface between them and legacy POTS networks. And multicast UDP is used to stream TV and/or radio throughout a network or building. If you have a "cable TV" box that plugs into your fiber internet connection, it's probably receiving multicast UDP. The TV/internet company has end-to-end control of this network, so they use QoS to make sure these packets never get dropped. There was a write-up posted on Hacker News once about someone at a hotel discovering a multicast UDP stream of the elevator music.
The long answer is "it depends on how you do it" unsurprisingly video and voice/audio are probably the most different ways that you can "choose" to do distribution
Yea this works for mp4 and HN seems confused about how.
The MOOV atom is how range requests are enabled, but the browser has to find it first. That's why it looks like it's going to download the whole file at first. It doesn't know the offset. Once it reads it, the request will be cancelled and targeted range requests will begin.
The two are essentially the same thing, modulo trading off some unnecessary buffering on both sides of the TCP pipe in the "one big download" streaming model for more TCP connection establishments in the "range request to refill the buffer" one.
It’s usually written to the end since it’s its not a fixed size and it’s a pain for recording and processing tools to rewrite the whole file on completion just to move the header to the start. You should always re-encode to move the header to the start for web though.
It’s something you see too much of online once you know about it but mp4 can absolutely have the header at the start.
For "VOD", that works (and is how very simple <video> tag based players sometimes still do it), but for live streaming, it wouldn't – hence the need for fragmented MP4, MPEG-DASH, HLS etc.
It does work for simpler codecs/containers though: Shoutcast/Icecast web radio streams are essentially just endless MP3 downloads, optionally with some non-MP3 metadata interspersed at known intervals.
Correct.
HLS and Dash are industry standards. Essentially the client downloads a file which lists the files in various bitrates and chunks and the client determines which is best for the given connectivity.
And even if you are using a "regular" video format like mp4, browsers will still use range requests [1] to fetch chunks of the file in separate requests, assuming the server supports it (which most do).
Yes, the statement is patently wrong. There are a few very popular video formats whose main feature is chunking through HTTP, like HTTP Live Streaming or MPEG-DASH.