Hacker News new | ask | show | jobs
by danbruc 3416 days ago
I think the article mixes up two things, namely the hash function business and uninformative priors.

Hash functions. In case of a good hash function the information about the actual distribution of the inputs is erased. That the hashes are uniformly distributed does not imply that the inputs are uniformly distributed which in turn renders any conclusion about the distribution of the inputs after applying a transformation invalid.

Uninformative priors. This seems to be more of a philosophical problem I don't really know much about. But as far as I can tell if one tries to quantify a lack of information in a naive way using probability distributions, then one gains information, for example that the value is uniformly distributed over some range, and this information has consequences like specific distributions of derived values.

So the attempt to quantify a lack of information turns into a self-defeating endeavor, not necessarily because of any inherent information but because of the information injected during the modeling process.

1 comments

That is not a very precise way of describing the uniformity of hash functions, and it's not actually true. For example, if my input has some element with 50% probability, then the hashed output is going to have some element with at least 50% probability.

But this is well-known, and because it's well-known, no sane cryptographic system cares that hash outputs leak information that way. For example, look at PBKDF2, HMAC, or various asymmetric key authentication schemes.

You are right, I did not correctly word what I wanted to say. I was not thinking about repeated messages, I was only thinking about subsets of all possible messages. Say you are hashing HTML documents, then all your inputs will be in the subspace having <!DOCTYPE html><html as first characters but that information will be erased by the hash function.