Supposing I did want to use the file hash as a unique key and I really don't want to do a byte for byte comparison... And I care about speed but not so much about bad actors, what should I use?
Didn’t think about it much, but file size should be a good indicator if the hash isn’t horrible. md5 + file size comparison could work for your use-case.
One of the inputs for MD5 is the length of the message, so I'm at least wrong in the case of MD5. Don't know about the general case and although I'm interested in the answer I can't spend time on it right now. But if anyone has a pointer to a useful resource please reply.