Hacker News new | ask | show | jobs
by dsacco 3229 days ago
I have a few comments:

1. The paper itself[1] is extremely readable by the standards of most cryptography research. On one hand, this is great because I was able to follow the whole thing in essentially one pass. On the other hand, the paper is very long for its result (58 pages!), and it could easily do without passages like this one:

Yet because the algorithms that we are concerned with are deterministic, their behavior is governed by their inputs, thus they will produce the same stream of “random” numbers from the same initial conditions—we might therefore say that they are only random to an observer unaware of those initial conditions or unaware of how the algorithm has iterated its state since that point. This deterministic behavior is valuable in a number of fields, as it makes experiments reproducible. As a result, the parameters that set the initial state of the generator are usually known as the seed. If we want reproducible results we should pick an arbitrary seed and remember it to reproduce the same random sequence later, whereas if we want results that cannot be easily reproduced, we should select the seed in some inscrutable (and, ideally, nondeterministic) way, and keep it secret. Knowing the seed, we can predict the output, but for many generators even without the seed it is possible to infer the current state of the generator from its output. This property is trivially true for any generator where its output is its entire internal state—a strategy used by a number of simple random number generators. For some other generators, such as the Mersenne Twister [35], we have to go to a little more trouble and invert its tempering function (which is a bijection; see Section 5), but nevertheless after only 624 outputs, we will have captured its entire internal state.

That's a lot of setup for what is frankly a very basic idea. A cryptographer being verbose in their writing might briefly remind the reader of these properties with the first sentence, but they'd still likely do that with much more brevity than this. I understand wanting to make your research accessible, but for people who understand the field this detracts from getting to the "meat." It might make it harder to get through, but a 10-30 page result is preferable to a nearly 60-page one that assumes I know nearly nothing about the field. If I don't know these details very well, how can I properly assess the author's results?

2. The author's tone in her writing is something I take issue with. For example, passages like this one...

Suppose that, excited by the idea of permutation functions, you decide to always improve the random number generators you use with a multiplicative step. You turn to L’Ecuyer’s excellent paper [25], and without reading it closely (who has time to read papers these days!), you grab the last 32-bit constant he lists, 204209821. You are then surprised to discover that your “improvement” makes things worse! The problem is that you were using XorShift 32/32, a generator that already includes multiplication by 747796405 as an improving step. Unfortunately, 204209821 is the multiplicative inverse of 747796405 (mod 2 32), so you have just turned it back into the far-worse–performing XorShift generator! Oops.*

...go a bit beyond levity. If you're trying to establish rigorous definitions and use cases to distinguish between generators, functions and permutations, this isn't the way to do it. This isn't appropriate because it doesn't go far enough to formalize the point. It makes it intuitive, sure, and that's a great educational tool! But it's a poor scenario to use as the basis for a problem statement - research is not motivated by the failure of an engineer to properly read and understand existing primitives, it's motivated by novel results that exhibit superior qualities over existing primitives.

3. The biggest grievance I have with this paper is the way in which it analyzes its primitives for cryptographic security. For example, this passage under 6.2.2 Security Considerations:

In addition, most of the PCG variations presented in the next section have an output function that returns only half as many bits as there are in the generator state. But the mere use of a 2 b/2-to -1 function does not guarantee that an adversary cannot reconstruct generator state from the output. For example, Frieze et al. [12] showed that if we simply drop the low-order bits, it is possible for an adversary to discover what they are. Our output functions are much more complex than mere bit dropping, however, with each adding at least some element of additional challenge. In addition, one of the generators, PCG-XSL-RR (described in Section 6.3.3), is explicitly designed to make any attempt at state reconstruction especially difficult, using xor folding to minimize the amount of information about internal state that leaks out.17 It should be used when a fast general-purpose generator is needed but enhanced security would also be desirable. It is also the default generator for 64-bit output.

That's not a rigorous analysis of a primitive's security. It is an informal explanation of why the primitive may be secure, but it so high level that there is no proof based on a significant hardness assumption. Compare this with Dan Boneh's recent paper, "Constrained Keys for Invertible Pseudorandom Functions"[2]. Appendices A and B after the list of references occupy nearly 20 pages of theorems used to analyze and prove the security of primitives explored in the paper under various assumptions.

Novel research exploring functions with (pseudo)random properties is inherently mathematical; it's absolutely insufficient to use a bunch of statistical tests, then informally assess the security of a primitive based on the abbreviated references to one or two papers.

_________

1. http://www.pcg-random.org/pdf/hmc-cs-2014-0905.pdf

2. https://eprint.iacr.org/2017/477.pdf

1 comments

Just to be clear: it's not a cryptography paper, is it? Did you figure out what journal it was submitted to?
She submitted it to ACM Transactions on Mathematical Software. I would personally consider it a cryptography paper, for three reasons:

1. She purports to introduce a novel result that bridges "medium-grade" performance characteristics and security characteristics in one primitive. In fact, if you look at the PCG Random website (pcg-random.org), she very clearly compares and emphasizes both performance and security characteristics with functions like xorshift and ChaCha.

2. We see cryptography papers submitted to all manner of theoretical CS conferences and journals, for example Symposium on the Theory of Computing, which are not uniformly crypto-focused.

3. She acknowledges herself that she found it hard to categorize her paper (it could be relevant for simulstion, it could be relevant for stream ciphers, etc) in a blog post about how she chose the venue: http://www.pcg-random.org/posts/history-of-the-pcg-paper.htm...

As a meta point I read the whole thing, and I actually think it would be a nice publishable result if it were, say 10 - 20 pages. But 60 is wild! It took me longer to get through this "accessible" paper than it did for me to get through any of Boneh's papers on constrained and puncturable pseudorandom functions!

It's definitely interesting, and sure, why not explore "medium-grade security" that makes explicit tradeoffs with performance and security. But the presentation seems like it was written by someone writing for a non-academic audience, and the content of 6.2.2 "Security Considerations" is really light on provable security.

TOMS has a long and storied history, mostly on numerical codes. I was looking through it in grad school in the late 80s/early 90s for better saner methods in numerical software.

Better PRNGs for simulation is definitely in its bailiwick. Crypto ... maybe less so.

If you stripped out all the (weird, random) cryptographic stuff from this paper, it would read pretty much the same and make pretty much the same points, which tells me: it's not a cryptographic paper.
That's a fair point; it was mostly the presence of the random crypto that annoyed me :)
O'Neill recently mentioned the crypto aspects of PCG in an comment on another post by John D. Cook. I'll just quote it below. But it looks to me like she thought that any analysis she did on the prediction difficulty of PCG wouldn't be well regarded.

Also, I notice that lots of people seem to think her paper was too long, but they also claim that it doesn't say enough about their favorite topic. That seems to be happening with you.

Direct quote from O'Neill's comment here https://www.johndcook.com/blog/2017/07/07/testing-the-pcg-ra...

John, in your post, you said that PCG has “excellent statistical and cryptographic properties”. I’ve learn it best never to say “cryptographic security” or “cryptographic properties” when trying to place something on a spectrum of prediction difficulty. Too many misunderstandings. Saying “prediction difficulty” still causes some crossed wires, but it’s about as good as we can go.

Dan & Dmitriy, just to be 100% clear, I HAVE NEVER RECOMMENDED PCG FOR CRYPTOGRAPHY. I do however, care about prediction difficulty, and I don’t like trivially predictable generators. Because my viewpoint is often misunderstood, I have some more blog posts in the pipeline about these issues, but the key thing is that general-purpose PRNGs get used for almost anything, from using the low-order bit to toss a coin, to supporting randomized algorithms. If someone can predict your generator, they can mount an algorithmic complexity attack on your randomized algorithm, tanking its performance. If predicting the generator is more costly than the algorithmic complexity attack itself, people will try their attacks on easier targets. But if the generator spends to much time being trying hard to be unpredictable, we also tank our performance, thus we have to try to strike a balance in a different place than we do for traditional cryptographic applications. We already live in a world where hash table implementations in scripting languages need to be hardened because of algorithmic complexity attacks; this is the next logical step.

All that said, there are members of the pcg family (not pcg32 and especially not pcg32_fast!) that I personally think would be really challenging to predict. I also find it frustrating that people don’t compare like with like. If you want to compare the prediction difficulty of PCG against a cryptographically secure PRNG, you need to compare a PCG variant at least broadly similar in size. Say, for example, we wanted to contrast PCG against the ChaCha PRNG from Orson Peters (perhaps with four rounds rather than the full 20), that PRNG is 104 bytes in size, so it’s fairest to compare it to pcg64_c8, which is 80 bytes, or pcg64_c16, which is 144 bytes.

Regarding prediction difficulty, I’m very well aware of Bruce Schneier’s law, “Anyone, from the most clueless amateur to the best cryptographer, can create an algorithm that he himself can’t break. It’s not even hard. What is hard is creating an algorithm that no one else can break, even after years of analysis. And the only way to prove that is to subject the algorithm to years of analysis by the best cryptographers around.”, and his subsequent elaboration “Anyone can invent a security system that he himself cannot break. I’ve said this so often that Cory Doctorow has named it “Schneier’s Law”: When someone hands you a security system and says, “I believe this is secure,” the first thing you have to ask is, “Who the hell are you?” Show me what you’ve broken to demonstrate that your assertion of the system’s security means something.” Thus my personal thoughts on how difficult it is mean NOTHING in a cryptography context. I also can’t expect cryptographers to spend their time on my education, but if someone out does know a simple and efficient algorithm that can reliably break pcg64_c8, I really would love to see how. (Also, hey Bruce, gender-neutral language, it’s a thing.)

I can do at least one thing that adds a tiny tiny tiny bit of credibility in the eyes of folks like Bruce Schneier , I can show other PRNGs I have broken that might have seemed hard to predict to a casual observer. Mostly that won’t actually help though, because a cryptographer would say “Ha! That’s toddler level stuff!” and a mathematician might say “I can’t understand why you care about prediction at all”. Sometimes I feel there should be more people trying to occupy the middle ground. Meh.

But, let’s be clear, DO NOT USE PCG FOR CRYPTOGRAPHY. DO NOT USE PCG FOR CRYPTOGRAPHY. DO NOT USE PCG FOR CRYPTOGRAPHY. Clear?

I think we've surrendered some of the "clarity" argument with the sentences in that paper describing which exact instantiation of the non-cryptographic RNG to select for "sensitive" applications.

Even here, in this comment, it's really hard to follow what you're saying. For instance, you've taken the time to compare the "prediction difficulty" of the PCG PRNG to that of a ChaCha20-based DRBG. But ChaCha20 is a stream cipher, a cryptographic primitive. To be competitive with it at producing uncorrelated bits is to be yourself a cryptographic primitive; that is, to argue that an LCG and a trivial mixer is all we ever needed to encrypt data. That would be... newsworthy?

Also: if you're making an appeal to the cryptographic literature, there are better people to cite than Bruce Schneier.

ACM Transactions on Mathematical Software.