Hacker News new | ask | show | jobs
by joshuaellinger 2165 days ago
Weird -- it appears to be missing the last 10-20 years of development in random numbers by the data science and cryptography community. They have put a lot of work into it.

If I recall correctly there are two major angles, 1. there is an instruction in modern Intel chips that samples random thermal noise. 2. there are a whole class of elliptical curve approaches that pass a bunch of randomness tests.

And I'm just scratching the surface here.

4 comments

So this is not what the article is about.

It discusses a new method to generate integer values according to an arbitrary probability distribution, using as input a uniform random generator. Whether the input generator is truly random, pseudorandom, cryptographically secure etc, is irrelevant: the output will presumably only be as random/secure as the input RNG.

Admittedly, the article does a poor job at explaining what the FLDR method is about, and it looks as biased as the method itself (sorry for the easy pun). From my understanding of the paper [1], the method is better than the state-of-the-art Alias method [2] only when the entropy of the target distribution is low. When entropy is high, it performs similarly (or may even be a bit slower) but uses up to 8 times more memory space.

[1] https://arxiv.org/abs/2003.03830

[2] https://en.wikipedia.org/wiki/Alias_method

Don't bother with elliptic curves, any stream cipher will do. We make messages indistinguishable from random with those, we might as well use them for random number generation as well.

The arc4random_buf(3) function for instance uses the Chacha20 cipher under the hood (some outdated & broken implementations may still use RC4).

The article is not about randomness in general, once one reads the whole. It's just the title and the longish intro that can mislead.

It's about:

https://arxiv.org/abs/2003.03830

"The Fast Loaded Dice Roller: A Near-Optimal Exact Sampler for Discrete Probability Distributions"

"Empirical evaluations on a broad set of probability distributions establish that FLDR is 2x-10x faster in both preprocessing and sampling than multiple baseline algorithms, including the widely-used alias and interval samplers. It also uses up to 10000x less space than the information-theoretically optimal sampler, at the expense of less than 1.5x runtime overhead."

I think the algorithm is most likely worthwhile, but the article is terrible. Another instance of journalism adding a narrative meant to entertain but really just disinforms.

Also if I had to guess without studying the paper, the benefit of the sum of the weights being a power of two isn't just being efficient in memory or time, but rather efficient in use of the random input tape. Naively, if you want to generate a uniform number between 0-128 inclusive from an input of uniform bits, you consume 8 random bits and if they form a value within the range you return it. Otherwise (129-255) you throw all the bits away and start over. Using the leftover entropy seems like it should be possible somehow.