Hacker News new | ask | show | jobs
by leo_pekelis 4163 days ago
One point of clarification, the y-axis on the chart does have the same meaning for both lines. It is 1 minus the chance of committing a type I error. I think you do point out an important nuance that under sequential testing a type I error changes to “ever detecting a significant result on an insignificant test” instead of just at one, predetermined visitor count.

The amount of accumulated evidence for X is exactly a p-value, or a measurement which can tell you if there is enough evidence in the experiment to contradict a hypothesis of “no difference between a baseline and variation.” A high p-value, or low significance tells you there is a lack of evidence to make this claim.

You bring up a very interesting point which is that with sequential testing it is actually possible to also look for evidence of ‘not X’ or that there really is no detectable difference. This works by ‘flipping the hypothesis test on it’s head’ and allows for a mathematical formulation of stopping early for futility. We do not currently offer this in Stats Engine because we believe it’s the less important quantity of the two, but it may be the focus of future research.