Hacker News new | ask | show | jobs
by steerablesafe 1552 days ago
This method generates approximately

  log_2(sqrt(1/(2*pi*p*(1-p))) - 1000*log_2(p^p * (1-p)^(1-p))
bits of entropy from 1000 coin flips, where "p" is the probablity of flipping heads.

Neumann's method generates 1000/(p(1-p)) bits of entropy from 1000 coin flips. The theoretical maximum is

  -1000*log_2(p^p * (1-p)^(1-p))
but it requires knowing "p" exactly. This method is close to the theoretical maximum. I didn't plug in numbers, or analyise further, but it's far better than Neumann's method.

One obvious drawback is that you have to flip the coin a 1000 times to produce the first unbiased random bit, while Neumann's method starts producing bits much earlier.

1 comments

> One obvious drawback is that you have to flip the coin a 1000 times to produce the first unbiased random bit, while Neumann's method starts producing bits much earlier.

Definitely. Though you could fix that problem relatively easily.

I think you might even be able to run von Neumann's method first, but store the coin flips; and then once you've got enough stored, extract a few more bits from the already used flips.

Perhaps like this:

When you do two flips, you add one of three possible tokens to your list:

'double-heads', 'double-tails' or 'mixed'.

Crucially, you only store 'mixed' and not whether it was 'head-tails' or 'tails-heads' because that information was already used to produce the von-Neuman-bit.

After your list has 1000 entries, you run an algorithm a bit like what I originally described to extract bits. The complication is that the table you construct has all possibilities of shuffling a fixed number of 'double-heads', 'double-tails' or 'mixed' tokens.

It feels like an awkward middle ground. Like this method has a limited entropy output for the first 2000 coin flips (first 1000 double-heads/double-tails/mixed entries), and then suddenly it adds back a ton of lost entropy.

An other commenter linked to some papers for asymptotically optimal entropy generation, I wonder if there is more of a streaming method there. It feels like there has to be, even maybe after a slow start. My naive intuition is that after 1000000 coin flips you have a good idea what p is, and then you can basically do arithmetic coding from there. Of course a theoretically correct method can't do exactly this, but it might asymptotically approach it.

Oh, if you want something practical, the approach that Linux's /dev/random takes is probably the one to go.

/dev/random being unbiased relies on some assumptions about cryptography; but in practice these assumptions are at least as well-founded as our assumption that our coin flips are independent.

Look at some of the papers mentioned in other comments on this submission. There are (near) optimal streaming methods.