|
|
|
|
|
by fc417fc802
146 days ago
|
|
> The 32 bit hash of CRC32 is too low for file checksums. What makes you say this? I agree that there are better algorithms than CRC32 for this usecase, but if I was implementing something I'd most likely still truncate the hash to somewhere in the same ballpark (likely either 32, 48, or 64 bits). Note that the purpose of the hash is important. These aren't being used for deduplication where you need a guaranteed unique value between all independently queried pieces of data globally but rather just to detect file corruption. At 32 bits you have only a 1 out of 2^(32-1) chance of a false negative. That should be more than enough. By the time you make it to 64 bits, if you encounter a corrupted file once _every nanosecond_ for the next 500 years or so you would expect to miss only a single event. That is a rather absurd level of reliability in my view. |
|
Readme in SMHasher test suite also seems to indicate that 32 bits might be too few for file checksums:
"Hash functions for symbol tables or hash tables typically use 32 bit hashes, for databases, file systems and file checksums typically 64 or 128bit, for crypto now starting with 256 bit."