Hacker News new | ask | show | jobs
by arp242 592 days ago
Last year I set up some QEMU VMs to test some things. I struggled mightily to get the FreeBSD one up and running. QEMU flags are not the easiest – lots of knobs to turn and levels to pull, but after quite a lot of time trying to get it to work, it turned out that the installer ISO was just damaged. Do'h. It's impossible to say why/how this happened, but probably during download(?)

Since then I've started to check the sums after downloading, just to be sure.

I wish every binary format would include a hash of the content.

Also this is something that can be in HTTP – it's kind of silly I need to manually download a separate sum file and run a command to check it. Servers can send a header, and user agents can verify the hash. I don't know why this isn't part of HTTP already, because it seems pretty useful to me.

2 comments

TCP has built in checksums that prevent most data corruption. I believe this is why it’s not part of HTTP, because TCP should already be doing this for you.

I’m guessing that for your very large file download you had an unusually high number of corrupted TCP packets and some of those were extra unlucky and still had valid checksums.

Or something else went wrong, so the TCP packets are correct for what some backend told it to have, just wasn't what should have been served for 1-2 packets or whatever.
TCP's is quite simple, but I would think TLS's checksum would be more infallible.
It seems most likely that the corruption happened in RAM or the local storage device, after the TLS integrity check had already happened.
Scary to consider. And if that's so, it can also happen after your integrity check...
The most likely thing by far is that the download failed part way through, but the error was never reported, or the reported error was never checked.

Also, it's quite possible that the HTTP client didn't even know that the download failed: a common pattern is for the server to send a Content-Length of 0, and simply close the connection when it's done sending all of the traffic (i.e. set the TCP FIN flag on the last data packet). If the server decides to abandon the connection early for any reason, then it will... close the connection - which the client will just interpret as the end of the body, and have no idea that the file failed to download fully.

> I don't know why this isn't part of HTTP already

It could probably be improved, but HTTP does support this already:

https://datatracker.ietf.org/doc/html/rfc2616#section-14.15

A nice; I didn't know about that. Do Firefox and Chrome actually check it though?