| >assuming you have a perfect cryptographic hash function Except there is no such thing (not in the sense you need it) - it too would be a perfect source of randomness, and you're just pushing the problem down the road. As an example, NIST publication 800-90B recommends multiple randomness extractors, one of which is SHA. They recommend using twice the amount of entropy in as the entropy out to get random "enough" bits. Thus even SHA is not going to mix bits well enough as you want. (The things called perfect hash functions in the literature map N items into N slots with no collisions, which is not what is needed here). As a perhaps surprising counterexample to any simple solution, here [1] is a math paper with a proof that there can be no optimal algorithm that is best for all values of coin bias. Here's [2] a cool walkthough on some of the ideas in theory that have been investigated. [1] https://web.eecs.umich.edu/~qstout/abs/AnnProb84.html
[2] http://www.eecs.harvard.edu/~michaelm/coinflipext.pdf |
???
Okay, lets choose the simplest hash function of them all: a 1-bit XOR (resulting in a 1-bit "hash").
Lets say I flip the 75% biased coin 100 times, count heads as 1 and tails as 0. I XOR all the results together. What's the probability of XOR(all) == 1, and what's the probability that XOR(all) == 0?
You'll find that its quite close to 50/50, which is the limit to the amount of entropy you can pull from a 1-bit hash function.
---------
Now lets see for smaller values:
1 flip: 75% chance of 1, 25% of 0. Which is the limit of information so far.
2 flips: ... you know what? Imma write a program and get back to ya a bit later. I probably will edit my post.
------
2 flips: 62% chance of Parity0 3 flips: 43% chance of Parity0 4 flips: 53% chance of Parity0
Etc. etc. etc.
By 16 flips, its 49.9985% chance of Parity0, pretty close to unbiased. It gets pretty hard to distinguish this coin from an unbiased one.
--------
I'm not sure if you need a cryptographic hash actually. Its just that cryptographic hashes are close enough to random that its easier to use.
Now I'm curious if a CRC32 would efficiently extract 16-bits of entropy from biased coins. CRC32 is clearly not a cryptographic hash, but its pretty good as an "avalanche" due to the nature of GF-arithmetic.