Hacker News new | ask | show | jobs
by kgwgk 2645 days ago
> They simply say that if rerunning the experiment again, it would be surprising to get a different result.

Not really. A low p-value says that it was surprising to get the result that you got, assuming that the null hypothesis is true. And if the null hypothesis is true it would be surprising to get again the same result (i.e. a result as extreme). If the null hypothesis is not true, the result would not be so surprising (or maybe more, if the true effect is in the “wrong” direction).

The result we got gives some evidence for the null hypothesis being false, but if the null hypothesis was very very likely to be true before it may still be very likely to be true afterwards. In that case it wouldn’t be surprising to get a different result if the experiment is performed again.

Illustration: I roll a die three times. I get three ones. P<0.01 (for the null hypothesis of a fair die and the two-tailed test on the average). This is not simply saying that if I roll the die three times again it would be surprising to get something other than ones.

1 comments

I roll a die three times. I get three ones. P<0.01 (for the null hypothesis of a fair die and the two-tailed test on the average).

Hmm. At a glance, that doesn't seem right. Yes, the chances of rolling 3 1's is 1/(6^3), but if we only rolled once and got a single 1, we wouldn't have any reason to suspect that the die was unfair. So maybe we should only consider the second two repetitions, and conclude with p ~ .03 that the die is unfair? Otherwise, consider the case that we rolled a 1, 5, 2 --- certainly we shouldn't use this series of non-repeated outcomes as p < .01 evidence of an unfair die?

If the die is fair, the average score will be 3.5. One can define a test based on that value and reject the null hypothesis when the average score is too low or too high.

The sampling distribution for the average can be calculated and for three rolls the extreme values are 1 (three ones) and 6 (three sixes) which happen with probability 1/216 each. Getting three ones or three sixes is then a p=0.0093 result.

You raise a valid point. This is clearly not the best test for detecting unfair dice, because for a die which has only two equally probable values 3 and 4 we would reject the null hypothesis even less often than for a fair die! (In that case, the power would be below alpha, which is obviously pretty bad.)