Yeah basically. That could be a disaster for, say, nonce generation.
The solution would be to have multiple independent entropy pools and either bind them to cores(/sets of cores) or pick a non-busy one in a contention case.
Yes, if there is no a urandom generator per core, it would be convenient for some extreme cases to introduce such. The question is if it's worth the effort and the resulted "bloat" of the kernel code and memory usage. Linux runs on some very small devices too and even there decent user-space programmers can easily do their own per-thread generation in their programs. Normal uses of crypto are such: you initialize your own crypto once, then produce a lot of data in your own space.
If urandom is really "one for all cores" somebody should be able to demonstrate the speed drop by just writing some bash script? Volunteers?
It seems to work in part. For /dev/urandom, I see always roughly the same throughput:
$ time dd if=/dev/urandom of=/dev/null bs=1 count=10000000
real 0m10.640s
user 0m0.696s
sys 0m9.940s
$ time (for i in $(seq 1 50); do dd if=/dev/urandom of=/dev/null bs=1 count=200000 2>/dev/null & done; wait)
real 0m11.199s
user 0m1.232s
sys 0m42.828s
$ time (for i in $(seq 1 500); do dd if=/dev/urandom of=/dev/null bs=1 count=20000 2>/dev/null & done; wait)
real 0m11.234s
user 0m1.252s
sys 0m42.536s
whereas for /dev/zero:
$ time dd if=/dev/zero of=/dev/null bs=1 count=10000000
real 0m3.268s
user 0m0.660s
sys 0m2.604s
$ time (for i in $(seq 1 50); do dd if=/dev/zero of=/dev/null bs=1 count=200000 2>/dev/null & done; wait)
real 0m2.550s
user 0m1.192s
sys 0m8.760s
$ time (for i in $(seq 1 500); do dd if=/dev/zero of=/dev/null bs=1 count=20000 2>/dev/null & done; wait)
real 0m2.612s
user 0m1.228s
sys 0m8.112s
Of course, the bash for-loop here together with the forking has some considerable overhead, so these values should likely be interpreted carefully (Linux 3.14-rc7, Core i5 520M).
>This patch solves a problem where simultaneous reads to /dev/urandom can cause two processes on different processors to get the same value. We're not using a spinlock around the random generation loop because this will be a huge hit to preempt latency. So instead we just use a mutex around random_read and urandom_read. Yeah, it's not as efficient in the case of contention, if an application is calling /dev/urandom a huge amount, it's there's something really misdesigned with it, and we don't want to optimize for stupid applications.
If you're using crypto/rand to yank a whole bunch of random numbers out for the purpose of deciding which DNS record to use when multiple DNS records were returned, yes, the Go application is misdesigned. Such applications should be using math/rand. Seeding your math/rand from crypto/rand isn't a bad idea, but you don't need to be hammering on /dev/urandom in such code.
This "some random data is more important then other random data" musical chair dance going on with /dev/random vs /dev/urandom vs userland [CS]PRNGs (often gathering from extremely poor sources, or using broken algos) has been nothing short of an unmitigated security and useability disaster.
We have the abillity to make the /dev/urandom CSPRNG secure enough and fast enough for (almost) any randomness purpose. We need to cut all the rest of this insane crap.
People choose the wrong RNGs and get burned, or wont use the right ones because of speed or imaginary entropy exhaustion issues. This matters.
It's impossible (or just not worth the trade-off) to make one piece of software (this time, kernel) fast for any use case imaginable. In this case, kernel behaved correctly but with the speed degradation for extreme cases. That the author's Rube Goldberg machine then runs slow I don't consider kernel to be guilty.
The guy uses PHP and instead of built-in HTTPRequest he uses curl to make a request to "a bucketed key-value store built on PostgreSQL that speaks HTTP which uses Clojure and the Compojure web framework to provide a REST interface over HTTP." A bit of shooting the flies with cannons on every side?
On another side, if it can be proved that urandom has serious problems in reasonable use cases it should be checked what can be changed and how.
You don't need to be, but why not? It should be plenty fast and work well. If it's turning out to be too slow due to too much locking, that should be fixed.
2 - You don't need the crypto qualities of it and you're emptying the entropy pool for nothing
3 - You're doing much more work, especially if you're reading one byte at a time from /dev/urandom (doing a syscall, etc), while rand is just a calculation
Yes, it should be fixed. Yes, it's still a "misdesign" to use the cryptographic random number generator when you just want "a" psuedo-random number, right now. For choosing which of the several DNS answers you use, you could pretty much get away with keeping a counter and returning that counter modulo the number of choices. It's technically wrong for several reasons, but you could get away with it. That's how low-impact this random number usage is. Using a cryptographically secure random number generator for that is always going to be overkill for such a task.
So, would a possible solution be to check how many people are using the random generator at once? If only one process is currently accessing /dev/urandom, then avoid the spinlock and problem solved. Or, I am completely wrong.
The solution would be to have multiple independent entropy pools and either bind them to cores(/sets of cores) or pick a non-busy one in a contention case.