Hacker News new | ask | show | jobs
by dbrueck 591 days ago
Interesting! It's worth noting though that HTTP actually works very well for reliably downloading large immutable files.

And since this proposed protocol operates over TCP, there's relatively little that can be done to achieve the performance goals vs what you can already do with HTTP.

And because "everything" already speaks HTTP, you can get pretty close to max performance just via client side intelligence talking to existing backend infrastructure, so there's no need to try to get people to adopt a new protocol. Modern CDNs have gobs of endpoints worldwide.

A relatively simple client can do enough range requests in parallel to saturate typical last-mile pipes, and more intelligent clients can do fancy things to get max performance.

For example, some clients will do range requests against all IPs returned from DNS resolution to detect which servers are "closer" or less busy, and for really large downloads, they'll repeat this throughout the download to constantly meander towards the fastest sources. Another variation (which might be less common these days), is if the initial response is a redirect, it may imply redirects are being used as a load distribution mechanism, so again clients can ask again throughout the download to see if a different set of servers gets offered up as potentially faster sources. Again, all of this works today with plain old HTTP.

1 comments

Last year I set up some QEMU VMs to test some things. I struggled mightily to get the FreeBSD one up and running. QEMU flags are not the easiest – lots of knobs to turn and levels to pull, but after quite a lot of time trying to get it to work, it turned out that the installer ISO was just damaged. Do'h. It's impossible to say why/how this happened, but probably during download(?)

Since then I've started to check the sums after downloading, just to be sure.

I wish every binary format would include a hash of the content.

Also this is something that can be in HTTP – it's kind of silly I need to manually download a separate sum file and run a command to check it. Servers can send a header, and user agents can verify the hash. I don't know why this isn't part of HTTP already, because it seems pretty useful to me.

TCP has built in checksums that prevent most data corruption. I believe this is why it’s not part of HTTP, because TCP should already be doing this for you.

I’m guessing that for your very large file download you had an unusually high number of corrupted TCP packets and some of those were extra unlucky and still had valid checksums.

Or something else went wrong, so the TCP packets are correct for what some backend told it to have, just wasn't what should have been served for 1-2 packets or whatever.
TCP's is quite simple, but I would think TLS's checksum would be more infallible.
It seems most likely that the corruption happened in RAM or the local storage device, after the TLS integrity check had already happened.
Scary to consider. And if that's so, it can also happen after your integrity check...
The most likely thing by far is that the download failed part way through, but the error was never reported, or the reported error was never checked.

Also, it's quite possible that the HTTP client didn't even know that the download failed: a common pattern is for the server to send a Content-Length of 0, and simply close the connection when it's done sending all of the traffic (i.e. set the TCP FIN flag on the last data packet). If the server decides to abandon the connection early for any reason, then it will... close the connection - which the client will just interpret as the end of the body, and have no idea that the file failed to download fully.

> I don't know why this isn't part of HTTP already

It could probably be improved, but HTTP does support this already:

https://datatracker.ietf.org/doc/html/rfc2616#section-14.15

A nice; I didn't know about that. Do Firefox and Chrome actually check it though?