Hacker News new | ask | show | jobs
by Animats 3229 days ago
Here's the site for the random number generator.[1] It's basically a simple linear congruential random number generator (well known, but not very good) fed into a mixer. The mixer is new.

Most of the analysis is about the LCG or the final output. The suggested mixer is just

    output = rotate64(uint64_t(state ^ (state >> 64)), state >> 122);
That's simple, and the insight in this paper is that something that simple helps a lot. I would have thought that you'd want a mixer where changing one bit of the input changes, on average, half the bits of the output. The mixer above won't do that. DES as a mixer would probably be better, but it's slower. The new result here is that something this simple passes many statistical tests.

This isn't crypto-grade; both that mixer and a LCG generator are reversible with enough work.

[1] http://www.pcg-random.org/

2 comments

>linear congruential random number generator (well known, but not very good)

Relevant quotes from the paper:

"But if you began reading the section with the belief that “linear congruential generators are bad” (a fairly widely-held belief amongst people who know a little about random number generation), you may have been surprised by how well they performed. We’ve seen that they are fast, fairly space efficient, and at larger sizes even make it through statistical tests that take down other purportedly better generators. And that’s without an improving step."

and

"Despite their flaws, LCGs have endured as one of the most widely used random- number generation schemes, with good reason. They are fast, easy to implement, and fairly space efficient. As we saw in Section 3.3, despite poor performance at small bit sizes, they continue to improve as we add bits to their state, and at larger bit sizes, they pass stringent statistical tests (provided that we discard the low-order bits), actually outperforming many more-complex generators. And in a surprise upset, they can even rival the Mersenne Twister at its principle claims to fame, long period and equidistribution."

"Nevertheless, there is much room for improvement. From the empirical evidence we saw in Section 3.3 (and the much more thorough treatment of L’Ecuyer & Simard [28], who observe that LCGs are only free of birthday-test issues if n < 16p1/3, where n is the number of numbers used and p is the period), we can surmise that we may observe statistical flaws in a 128-bit LCG after reading fewer than 247 numbers (which is more than BigCrush consumes but nevertheless isn’t that many—an algorithm could plausibly use one number per nanosecond and 247 nanoseconds is less than two days)."

> 247 nanoseconds is less than two days

Yes, yes it is.

For those just as confused as I was, replace all instances of 247 with 247

Hmmm... is HN removing ^carets?

Replace 247 with '2^47', Which is ~'1.28x10^14' or ~36 hours in ns.

Hah. I used double asterisks, should have seen that coming in hindsight. Muphry's Law.
I've often been frustrated trying to put more than one asterisk (not italic tag) in an HN comment.

Serious question, what happened to WYSIWYG? The Web has supported it for like 20 years, why have we standardized on clunky in-line markup for rich text entry, pretty much everywhere?

FWIW, you can indent text in a HN comment (by a few spaces), and you get

  a Blockquote like this, 
  and *asterisks* survive: 2**47
WYSIWYG is kinda crap on mobile. A checkbox saying "Don't format" would be nice, though.
> replace all instances of 247 with 247

That doesn't appear to help ;-)

Two-to-the-forty-seven nanoseconds is a little over thirty-nine hours.

(Let's see HN mangle that!)

I feel bad for the poor NLP tokenizer that will process this later and decide that two-to-the-forty-seven is a word, strangely unattested in any corpora.
It's a weird paper. It's quite long and takes an eternity to reach this simple point. It's also littered with "security concerns" which are confusing at best and misleading at worst; in reality, none of the generators it discusses are suitable for "sensitive applications", even in the corner-case scenario it discusses of needing permutations of all the b-bit integers, a problem we already have cryptographic tools to solve.

But I'll confess to not really understanding what all the fuss is about insecure generators.

Linear congruential generators (x = k1*x + k2, unsigned with truncation on overflow) have some strange properties, some of which the paper explores. If you repeatedly take three sequential values from one and treat them as 3D coordinates, the points line up in parallel planes. (There's an explanation of why in Knuth.) Something has to be done to destroy that order. The solution here is to XOR the upper and lower halves of the 128-bit state to get 64 bits, then circular shift that by the high 6 bits of the 128-bit input. This passes most of the classic tests for random number generators.

That's the paper, basically.

The paper is really two papers. The first paper is a thoughtful and careful explanation of how we should consider judging pseudorandomness, and how to measure various criteria, along with a survey of many popular ten-lines-or-less PRNGs which PCG competes with. I found it interesting.
> But I'll confess to not really understanding what all the fuss is about insecure generators.

Based on things she's said on her site and in comments on John D. Cook's blog, it's all about algorithmic complexity attacks on randomized algorithms.

In other words, if you're doing quicksort on external input with a random pivot, and someone knows the PRNG state, they can make a pathological input that'll trigger quadratic behavior.

I don't know how likely this is to happen, but I know there were similar attacks on hash tables a few years ago.

And we addressed it with actual cryptography: SipHash. Since cryptographic random number generators are generated with very similar primitives, why wouldn't SipHash be the answer to those problems as well?