|
|
|
|
|
by AlienRobot
902 days ago
|
|
In Python the Adler library returns a 32 bit checksum. It works pretty well when you're comparing one file to another file. It doesn't work pretty well if you want to, for example, create a quick fingerprint that (tries to) uniquely identify tens of thousands of files. On StackOverflow I saw someone say that they got hash collisions in MD5 (128 bit) after hashing around 20k files. When I tried making something similar I figured if I added the size of the file in bytes to the hash that would decrease the number of hash collisions since you would need a permutation of bytes in a set of bytes of same size to generate the same MD5 hash to get a collision. Still feels random and unavoidable in the greater scheme of things, though. |
|
No need to compute the hash until there's at least two files with the same size.