Hacker News new | ask | show | jobs
by mdbco 4128 days ago
The single paragraph in the postscript of this paper (part 6) is actually really important. It's very common for people who are using statistical testing in applied settings to entirely forget about type II error (and correspondingly, the power of the test), and so when they see a p-value that isn't significant at a certain level (say 5%), then they just assume that the null hypothesis is true.

Of course, this is not correct, and all we can really say is that the test did not reject the null, given the size (type I error rate) and power (type II error rate) of the test. It's entirely possible that the null should be rejected, but the test is just not very good (i.e. it might have the correct size, but very poor power).

So given some complex and eccentric real-world data, how can we figure out what the power of a given test might be in practice? If you have some idea of what the data generating process might look like then one option is to do some simulations. This enables you to see what the size and power properties of your test are by empirically measuring the type I and type II error rates.