Very nicely put, and thank you for putting that together!
This patch goes a long way toward eliminating a long overdue userspace
crypto footgun. After several decades of endless user confusion, we will
finally be able to say, "use any single one of our random interfaces and
you'll be fine. They're all the same. It doesn't matter." And that, I
think, is really something. Finally all of those blog posts and
disagreeing forums and contradictory articles will all become correct
about whatever they happened to recommend, and along with it, a whole
class of vulnerabilities eliminated.
With very minimal downside, we're finally in a position where we can
make this change.
Seems like a no-brainer to me; as I understand, this effectively make urandom “always secure”, which is a very good thing; makes for an even more convincing argument when some colleague insists on using /dev/random “because it’s more secure”.
Do I understand correctly that with this patch, the only drawback is for weird architectures that do not have an instruction counter or another instruction to gather random data? And in those cases, the old behavior (“urandom may be insecure shortly after boot”) may even be worse than the new behavior (“may block while waiting for entropy”) ?
Right, it unifies /dev/urandom, /dev/random, and getrandom(flags=0) to all do exactly the same thing.
Most modern userspaces already use getrandom(flags=0). Nothing changes for them. They already count on the rng being seeded in one way or another.
Rather, this changes /dev/urandom, which previously would give insecure randomness before being seeded. With this change, this doesn't happen any more, because it makes /dev/urandom wait until it has been seeded.
In practice, the RNG get seeded by a large variety of things. As a last ditch effort, the Linus Jitter Dance will seed it.
Taken together, what all the above amounts to is that the regression potential is limited to systems where: (A) /dev/urandom is still being used, rather than getrandom(flags=0), (B) the boot sequence, due to some bug, hard-depends on unseeded reads from /dev/urandom, (C) no ordinary sources of entropy, such as interrupts and input devices and disk drives, are available, (D) the CPU is so ancient as to be missing a cycle counter, defeating the last ditch Linus Jitter Dance, and (E) a new kernel will be installed on this old system.
I argue that the set of machines where (A), (B), (C), (D), and (E) all hold is minuscule.
Doesn't this mean that non-blocking reads from /dev/urandom can now potentially return -EAGAIN (at e.g. very early boot time)? I think that's enough to subtly (nondeterministically) break userspace, in the short time window the entropy pool is not seeded enough, even if (C) and (D) do not hold.
Wasn't there many cases where they tried to make /dev/urandom wait until entropy & it broke many distros on certain machines? What's different this time around?
This is probably way outside of my sphere of competence, but....
If your approach is adopted, people would simply treat /dev/random and /dev/urandom as the same thing (which I gather is your intended goal). That is fine as long as CSPRNGs are relatively easy to make. I hear that this hinges on fancy theorems like P=BPP being true, but apparently they're not proven yet.
What if... in some parallel universe it turns out that P!=BPP, and the concept of CSPRNGs is fundamentally broken, and somebody discovers a practical method to break whatever PRNG is implemented in a system? In this admittedly unlikely universe, keeping the distinction between /dev/random and /dev/urandom (i.e. the former could block indefinitely, the latter could be insecure) seems to be the safer approach. Of course in this universe, Linux would still have to pull the PRNG from /dev/random and revert back to the old behavior, but at least it's fixable. But if userland drops the distinction between /dev/random and /dev/urandom, then the problem would be fundamentally unfixable until every app reviews and decides which guarantee they want for themselves (and releases patches).
Of course your patch does not really imply the contract between the kernel and userland has changed, which is why I mentioned intention. If it is intended to change the contract, maybe it's better to wait for P=BPP before you do it? :P
You're already misunderstanding how /dev/random and /dev/urandom work today, or indeed ever worked. Both devices have always read from the output of a CSPRNG.
It used to be that /dev/random did some accounting to try to estimate how much entropy was in its pool, and if that number declined too low (as reading from /dev/random was figured to decrease the entropy), it would simply refuse to run its CSPRNG to produce any more output until it get fed some more entropy. This accounting was heavily criticized for being magic thinking and unsupported by any actual research, and revisions to the randomness engine in Linux over the past decade have eventually eliminated this entropy accounting in favor of just tracking how much has ever been added--if there's not enough, then it blocks until there is.
I don't think I have that misunderstanding. My question is, do you have proof that:
1. the CSPRNG in Linux is secure, and
2. CSPRNGs in general exists ?
Fixing #1 simply requires changing to another algorithm.
Fixing #2 requires a secure RNG to block for entropy, and if the distinction between /dev/random and /dev/urandom goes away, then this scenario will cause problems _if_ it happens. I said it's very unlikely, but I don't think I should get this uncharitable response by pointing out the issue.
The proof of #2 is encryption exists. You can build a CSPRNG out of a cipher that's secure against chosen plaintext attack (trivial construction: encrypt a counter with your seed key). We haven't necessarily proven that encryption exists in the fully theoretical sense, but if you're considering the possibility that CSPRNGs don't exist, that means you have to simultaneously consider that encryption itself isn't meaningfully possible.
Applications trust /dev/urandom to be secure. If your scenario ends up being true, then instead of /dev/random acting like /dev/urandom, /dev/urandom should act like /dev/random since it is supposed to be secure, and we're back to having no distinction between the devices.
That is decidedly not true. /dev/urandom is not guaranteed to be secure upon boot before enough entropy is gathered by the system, but it is guaranteed to not block indefinitely. The patch changes this contract by making /dev/urandom guaranteed secure and maybe block indefinitely if some unlikely edge case is encountered.
If CSPRNGs are impossible, then you will have no CSPRNGs so both /dev/random and /dev/urandom will be insecure and will provide no guarantees solely due to impossibility of CSPRNGs.
Very nicely put, and thank you for putting that together!
This patch goes a long way toward eliminating a long overdue userspace crypto footgun. After several decades of endless user confusion, we will finally be able to say, "use any single one of our random interfaces and you'll be fine. They're all the same. It doesn't matter." And that, I think, is really something. Finally all of those blog posts and disagreeing forums and contradictory articles will all become correct about whatever they happened to recommend, and along with it, a whole class of vulnerabilities eliminated.
With very minimal downside, we're finally in a position where we can make this change.