| I recently implemented A/B testing on a client's site using one of these Javascript-based A/B testing tools (but not this one). I hadn't used one before, so wanted to verify the data would actually be accurate. I did an A/A test, basically testing the same exact page––expecting the results would be the same. Not only were the results not the same, but they were off by a wide margin. Given this, I don't know how I'm supposed to trust any of the data. Has anyone else had experiences like this? Is A/B testing in Javascript just not as reliable? |
That's not how A/B testing works. :)
Let's say we want to detect a 1% lift in some metric at 95% confidence and we set up an A/A test. We do the math and it tells us we need to sample 1,000 people to reach 95% confidence on a 1% lift.
If we ran the A/A test 100 times, roughly 5 of them would show a statistically significant difference between the two groups. That's what "95% confidence" means -- it means your false positive rate is 5%. This is called a Type I Error.
http://en.wikipedia.org/wiki/Type_I_and_type_II_errors
You could run a kind of meta-analysis and use the false positive rate as the variable you're measuring to see if there's a statistically significant difference between the %5 false positive rate you expect and the false positive rate the A/B testing software generates in practice.
In this case, your null hypothesis is that the "true alpha" of the A/B testing software is 0.05. You'd sample from among all the 95% confidence tests you run and see whether you can reject the null hypothesis.