Hacker News new | ask | show | jobs
by PaulAJ 3956 days ago
I thought that entropy solved this problem: your "weakly" random numbers from the thermometer might have, say, 1.3 bits of entropy for every reading. So you assemble 100 readings and that gives you 130 bits of randomness, which you extract by putting your 100 readings through a cryptographic hash algorithm that outputs 130 bits.

Presumably I'm missing something. Can someone tell me what it is?

2 comments

Whether a hash function is cryptographic or not is irrelevant: either way, it's computable, therefore you can reverse it and get information about the original data you hashed. Because that data was only weakly random, it's not independent of future data from the same sources, so you can compute what future results of the hashing algorithm are likely to be.

Obviously this is impractical in the real world, but in TCS randomness is defined as algorithmic randomness, which completely ignores the question of running time. Randomness is a mathematical concept, something can't suddenly become non-random because you gained a faster computer. For more information see: https://en.wikipedia.org/wiki/Kolmogorov_complexity#Kolmogor...

To correct that: there are many different definitions of randomness, and they're not actually using a form of algorithmic randomness in this paper (there are many definitions of that too), which I guess isn't so relevant for randomness extraction.
That's the right intuition. But here we have to distinguish between information theoretic randomness and computational randomness (also known as algorithmic randomness). In the former, we're guaranteed that the output is actually statistically close to uniform randomness, and no statistical test, no matter how much computing time it uses, can predict any bit of the output. In the latter, the output can actually be far from uniformly random, but it has the guarantee that no efficient program (i.e. polynomial time algorithm) can distinguish between the output, and true uniform randomness.

A randomness extractor guarantees the former (i.e. information theoretic randomness), whereas a cryptographic hash function can only, at best, guarantee the latter (i.e. computational randomness). One might say, for all intents and purposes, we only care about computational randomness anyways. The problem is, cryptographic hash functions only work under complexity assumptions that we don't know to be true. These complexity assumptions have a similar flavor to the P vs NP question, which you might've heard of.

On the other hand, we can actually outright prove that randomness extractors work, and they output information theoretic randomness -- no complexity assumptions required!

So, long story short, we'd like to theoretically ensure, and prove outright, that we can take weak sources of randomness, and amplify them to strong sources of randomness. Randomness extractors do just that.