| HN Mirror

Thanks for replying. I agree with all the points you mention your statistician covered, but you should make sure your users know what kind of test you're using. The only reason I say this is because this article gives me the impression that you were using a single one-tailed test (which, as I said in my post, is a perfectly acceptable thing to do in the context of web site A/B testing).

But, as far as "Optimezely encourages you to stop the test as soon as it reaches 'statistical significance,'" I'm not saying your user documentation or anything encourages people to stop tests early. I'm saying (and this is based only on the article as I've never used Optimizely) that your platform is psychologically encouraging users to stop tests early. E.g. from the article:

    Most A/B testing tools recommend terminating tests as soon as they show significance, even though that significance may very well be due to short-term bias. A little green indicator will pop up, as it does in Optimizely, and the marketer will turn the test off.

    <image with a green check mark saying "Variation 1 is beating Variation 2 by 18.1%">

    But most tests should run longer and in many cases it’s likely that the results would be less impressive if they did. Again, this is a great example of the default settings in these platforms being used to increase excitement and keep the users coming back for more.

I am aware of literature in experimental design that talks about criteria for stopping an experiment before its designed conclusion. Such things are useful in, say, medical research, where if you see a very strong positive or negative result early on, you want to have that safety valve to either get the drug/treatment to market more quickly or to avoid hurting people unnecessarily.

Unless you've built that analysis into when you display your "success message" that "Variation 1 is beating Variation 2 by 18.1%," I'd argue that you're doing users a disservice. When I see that message, I want to celebrate, declare victory, and stop the test; and that's not what you should encourage people to do unless it's statistically sound to do so.

The other thing in the article that lead me to this position is that you display "conversion rate over time" as a time series graph. Again, if I see that and I notice one variation is outperforming the other, what I want to do is declare victory and stop the test. That might not be mathematically/statistically warranted.

IMO, as a provider of statistical software, I think you'd do your users a service to not display anything about a running experiment by default until it's either finished or you can mathematically say it's safe to stop the trial. Some people will want their pretty graphs and such, so give them a way to see them, but make them expend some effort to do so. Same thing with prematurely ended experiments; don't provide any conclusions based on an incomplete trial. Give users the ability to download the raw data from a prematurely ended experiment, but don't make it easy or the default.