Hacker News new | ask | show | jobs
by ted_dunning 3440 days ago
P-values are just the wrong metric here. With good sequential testing, you would stop

a) when the challenger strategy is much worse than the best alternative with reasonable probability,

or

b) when it is clear that some strategy is very close to the best with high probability (remember, you may still not know how good the best is)

Note that waiting for significance kind of handles the first case, but in the case of ties, it completely fails on the second case. That is, when there are nearly tied options, you will wait nearly forever. It is better to pick one of the tied options and say that it is very likely as good as the best.