Hacker News new | ask | show | jobs
by closed 2755 days ago
I don't understand why you would need a sample of a certain size. Setting a significance threshold at 5% takes sample size in to account. For example, if I ran a permutation test with a sample size of 5 in each group it could never been significant at that threshold, and never is < 5%!

A small sample size would lower your power to detect meaningful differences, which the original scenario doesn't have (by definition).

(If distributional assumptions, etc, are violated, then that's a different story!)

1 comments

You need to make sure you have enough samples in order to know if you rejected the null hypothesis by chance. Stopping your test early, is a form of p-hacking. See:

https://heapanalytics.com/blog/data-stories/dont-stop-your-a...

Peeking at your data, and calculating the sample size you need for a test are separate statistical issues. I agree that peeking messes up significance levels :).

The point I was trying to make was you can decide to run a test with a very small sample (e.g. n = 5), and it will still have the level of type 1 power you set if you chose a significance level of .05.

> You need to make sure you have enough samples in order to know if you rejected the null hypothesis by chance.

You do this when you decide the significance level (e.g. .05). The value needed to reject, given a significance level, is a function of sample size.

The definition of Type 1 error on wikipedia has a good explanation of this:

https://en.wikipedia.org/wiki/Type_I_and_type_II_errors