Hacker News new | ask | show | jobs
by fmela 4039 days ago
> [F]or any study that requires sampling ... making sure we have enough data to ensure confidence in results is absolutely critical.

Is this necessarily true if you can sample from the population in a fair and unbiased way?

2 comments

>making sure we have enough data to ensure confidence in results is absolutely critical.

Yes, it's necessarily true. If your sample is small you are necessarily subject to large sampling error.

In essence: the individuals you happened to pick (even fairly) are overrepresented, and the rest are underrepresented.

Yes, of course, I can see that if you have an extremely small sample, then the _resolution_ of your results will suffer. However, I think it's much more important to ensure unbiased sampling than it is to ensure a large sample size.

For example, if you sample 1% of the population in a fair and unbiased way, that would tell you something with a much higher degree of confidence than if you sampled even 10% of the population in a biased way (or in a way such that you don't know whether you are biased or not).

If you have a sample of 10 people, 1 person represents 10% of the sample. The opinion of one person can swing your results by 10%.

For a sample of 100, it's 1%. For 1000, its 0.1%. The more opinions you can collect, the less they individually mean.

Yes, but it's the resolution that would suffer, not necessarily the result. For example, if 65% of the population would vote for candidate 1, an unbiased sample of size 10 would indicate that either 60% or 70% of the population would vote for candidate 1. A sample that is biased could literally tell you anything, regardless of how large it is (in absolute numbers).
Not every time, surely? A sample is randomly taken, so there will be variation. Even if it's unbiased, your samples will swing between all possible extremes. So you need to take a very large number of small samples.
The result of your sample would typically be anything between 30% and 100%.

Whereas, if you instead take a sample of 1000, you typically get results between 61% and 69%.

Source: http://www.wolframalpha.com/input/?i=binomial%2810%2C0.65%29...

You can't take an "unbiased" sample in the sense that you mean here.

There's only one way you can take a completely unbiased sample: if you know exactly how everyone will vote in advance and select them carefully on that basis. But if you already know how everyone will vote in advance, then sampling is a fruitless exercise.