Hacker News new | ask | show | jobs
by antics 4366 days ago
When you step back to think about it, a huge amount of CS theory papers rely on this hand-wavy definition of "randomness" to prove something. Oddly, though, if you actually press a CS theory person for a mathematical definition of this concept, they virtually always draw a blank.

As a student I remember thinking that this was incredibly alarming. Without a good understanding of this concept, how can we be sure any of what we're saying is even remotely correct?

What Knuth has given us here is the ideal treatment of the subject, essentially putting the question of what randomness is, to rest for good. He starts by taking a comprehensive, entertaining history of the landscape (in short: researchers ping-ponged between having definitions of randomness so strict that nothing was random, and so loose that they were all pathologically predictable) before finally proving a theorem that completely solves the problem.

It is a monumental accomplishment in the field, and it is quite shocking to me that it's still so obscure. It's one of my favorite results in all of CS, anywhere.

If you haven't had the chance, I highly recommend the rest of this volume (Seminumerical Algorithms). It is just stuffed with other surprising, amazing results.

3 comments

Semi-Numerical Algorithms is Knuth's Silmarillion to his jaunty Hobbit of Volume I and the serious Lord of the Rings questing that is Volume III. OK, I know it's a stretch, but Semi-Numerical Algorithms is more of a back story...these days writing a floating point implementation is much less likely than a linked list or a binary search.

But picking up a like new used copy from Amazon for less than $10 with shipping made it to good to pass up, and like the rest of Knuth, every time I read a few pages I learn something new.

John Denker (ex Bell Labs) has a good discussion on devices which rely on random distributions:

http://www.av8n.com/computer/htm/secure-random.htm

---

"My favorite proverb of all is the one that says for every proverb, there is an equal and opposite proverb. In this case, we should note that the proverb about not putting all your eggs in one basket is not necessarily a sound engineering principle. The right answer depends on the margin of error and on the per-basket failure rate. It also greatly depends on the chance of correlated, uncorrelated, and anti-correlated basket failures. Sometimes the best way to get your eggs from point A to point B is to put them all in one basket and take really good care of that basket."

---

What? The modern definition of pseudorandomness (https://en.wikipedia.org/wiki/Pseudorandomness#Pseudorandomn...) was figured out in the 80s through the works of Blum, Goldwasser, Micali, Goldreich, etc. and is not hand-wavy at all. It's pretty rigorous and reliable.
You are 100% mistaken that this implies we had a good definition of random sequences before Knuth. In the article you link, they discuss the uniform distribution, but any distribution (and the modern notion of probability in general) absolutely depends on a mathematically precise notion of random sequences.

If you don't believe me, read the chapter. Early probability theorists (e.g., von Mises, Kolmogorov) literally started thinking about randomness in order to define probability.

EDIT: And, I don't suppose it's worth pointing out that pseudorandomness is not at all the same thing as randomness. The fact that you seem to use them interchangeably is not a good sign IMHO.

EDIT 2: Why the unexplained downvote, HN? :(

I skimmed through the extract presented (didn't have time to go into detail) but I don't see a formal definition of any kind presented in the extract. Could you point me to where it is? And if it's not in the extract, then could you quote it here?

Pseudorandomness is not the same thing as randomness but most algorithms today work on pseudorandom numbers so the concept is important. My impression was that that's what you were referring to.

PS :- FYI I didn't downvote your comment. Actually upvoted as your post made me discover some new math (various notions of randomness by kolmogorov, von mises, martin-lof) :)

So, the excerpt here is chapter 3, section 1. The actual definition happens in chapter 3 section 5 ("What Is a Random Sequence?"). I have the book at home, though, so I can't quote it here. Sorry. But the intuition is, if you have an infinite sequence of random numbers, then the numbers in all infinite subsequences should be equidistributed. So, like, if you a stream of random 0's and 1's, then if you pick only every other number, the 0's and 1's have to be equiprobable, and if you pick every third number, they still have to be equiprobable, etc. This is slightly wrong, but it's on the right track to the actual definition.

re: Pseudorandomness, the point of pseudorandomness is the following.

1. A lot of algorithms use randomness to make pathologically bad cases extremely unlikely. For example, choosing a random pivot in quicksort makes the worst case very unlikely.

2. But in a lot of cases, this leads to huge amounts of space consumption. For example, most frequency moment estimations involve a matrix of random numbers. So if you're getting those numbers from a "truly random" source, then you have to store the entire matrix, which can be huge.

3. So, a better solution is to use a pseudorandom number generator! That way you can store a seed of s bits, and do something clever, like deterministically re-generate the matrix as you need it, rather than storing it outright.

Notice though, that this is not independent of the notion of randomness! In fact they are quite intimately tied together.

Your definition relies on the notion of probability though. So I'm not sure why you seemingly view Knuth's work as more fundamental than Kolmogorov's, etc.
Because the trick Knuth pulls is to express this intuition without appealing to the definition of probability. It's quite clever.