| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by edash 5326 days ago

The impact of this article rests with this sentence:

"Try 26.1% – more than five times what you probably thought the significance level was."

That is, if you peek after every observation and stop as soon as you reach 5% significance, there's actually a 26% chance the results are not significant. But that doesn't mean there's a 26% chance the other option is significantly better—just that there's a 26% chance neither is statistically better.

And for most startups, I think that's a fine compromise.

Sometimes I'll launch a new design and test just to make sure it's not terribly worse. If it reaches statistical significance (even if I "peek") then I'm cool with the new design and will make the switch.

And I'll continue to test and tweak the new design immediately after finishing the previous test. The time saved from my lazy statistics means we can move much more quickly.

If we had thousands of "conversions" a day, then it would make sense to be deliberate with our testing methods. But we don't, we have tens of conversions per day. And we can improve much faster using half-assed split-tests and intuition.

1 comments

extension 5326 days ago

There's no need to half-ass the test, you should be able to get the actual significance at any point in the experiment. The software just has to correctly calculate the conditional probability of significance.

link