| HN Mirror

If the die is fair, the average score will be 3.5. One can define a test based on that value and reject the null hypothesis when the average score is too low or too high.

The sampling distribution for the average can be calculated and for three rolls the extreme values are 1 (three ones) and 6 (three sixes) which happen with probability 1/216 each. Getting three ones or three sixes is then a p=0.0093 result.

You raise a valid point. This is clearly not the best test for detecting unfair dice, because for a die which has only two equally probable values 3 and 4 we would reject the null hypothesis even less often than for a fair die! (In that case, the power would be below alpha, which is obviously pretty bad.)