Hacker News new | ask | show | jobs
by azakai 3364 days ago
A random sample of 400 is enough to give a good estimate, regardless of the size of the population.

This is a non-intuitive result from statistics. The standard deviation - the average expected error - does not actually depend on the population size. It could be 7 billion or 7 hundred. The standard deviation only depends on the sample size, and is O(1/sqrt(n)).

In this case, with n=400, we have O(0.05). If we are testing a random variable with two values, then the true deviation is less than 0.5, so the expected error is 0.025 - we expect no more than 2.5% of mistake. That's very good!

(The bigger question is whether the sample is random or not.)

1 comments

I agree on the bigger question. I thought I was communicating the lack of random samples, but I appreciate you pointing out why just referencing the sample size is not good enough in a sense. Anyways, they practically required lack of randomness. Their sign-up took place at the same school and "requested that participants be traditional college students of heterosexual orientation". Then they obtained the older generation participants by getting the first group to hand over addresses of their older relatives, neighbors and employers. I don't think it could get much less random.