Hacker News new | ask | show | jobs
by ronaldx 4597 days ago
It's a bit awkward to give a full answer to this, but this is to the best of my understanding and explained as simply as is reasonable:

A small sample has less statistical 'power' to identify significant differences where they exist. Put another way, a large sample is more likely to give a true significant result than a small sample.

But, if you do see 10% significance(/90% confidence) in a small sample, this is just as good as 10% significance in a large sample. Although the cutoff point will be more rough in a smaller sample, it's a good standard practice to round conservatively to account for this.

10% is unlikely to be considered a good result for statistics in either case - you can engineer a result by doing 10 tests on nothing and there's a danger you would have unknowingly or unconsciously done this, maybe (for example) by not deciding the sample size in advance. However, there's also presumably strong enough evidence against a harmful difference that you aren't likely to lose anything by following these results.

It can be good idea to do numerous small investigative tests as justification for bigger tests - relying on lots of small tests alone requires consideration for multiple testing (e.g. Bonferroni correction).

1 comments

"But, if you do see 10% significance(/90% confidence) in a small sample, this is just as good as 10% significance in a large sample". That is not true, strictly speaking. You are assuming that small sample describes the underlying distribution well. But this may not be the case due to non-normality of the distribution itself or potential biases
Cool point and I agree.

The sample has to represent the population, that's fundamental. If the sample is so small that it can't characterise the population distribution, then you have a problem anyway. If you're measuring a events that happen 1% of the time (or 99% of the time), a sample of 100 is not nearly enough.

If you chose an appropriate non-parametric test to cover an unknown distribution with a small sample, it maybe would have zero power (impossible to give a significant result)