Hacker News new | ask | show | jobs
by scuba_man_spiff 3815 days ago
One thing I noticed that I haven't seen commented on yet:

The solution mentioned of running a two tailed test would not have solved the problem of a false result the author demonstrated through conducting an A/A test.

According to the image in the article: http://blog.sumall.com/wp-content/uploads/2014/06/optimizely...

The A/A test had: A1: Population: 3920 Conversion: 721 A2: Population: 3999 Conversion: 623

    Z-Score: 3.3
    2-tailed test signifiance: 99.92%
Looks like the one-tail vs. two tail test doesn't make huge difference in this case.

So, maybe a larger sample size would have seen a reversion to the mean, but given the size and high significance that would be unlikely (interesting exercise to try different assumptions to calculate how unlikely, with the most overly generous obviously just being the stated significance).

Yes, the test was only conducted over one day, but if it was the exact same thing being served for both, that shouldn't matter.

If there was a reversion to the mean due to an early spike, we would expect to see the % difference between the two cells narrow as the test kept running. You can see in the chart that the % difference (relative gap between the lines) stays about the same after 8pm on the 9th.

So if it's not the one-tailed test at fault, and it's not the short duration of the test at fault, what is?

Don't know.

I have seen in the past that setup problems are incredibly easy to make w/ a/b testing tools when implementing the tool on your site. I've seen in other tools things like automated traffic from Akamai only going to the default control, or subsets of traffic such as returning visitors excluded from some cells but not others.

Based on those results, I'd be suspicious of something in the tool setup being amiss.