Hacker News new | ask | show | jobs
by jfarmer 4915 days ago
I've worked at companies that tried to do this before. It makes no sense and shows the people running the A/B tests don't really understand the statistics behind A/B testing.

If I'm running an A/A test at 95% confidence and a sufficient number of visitors for whatever effect size I'm interested in, then 1 in 20 A/A tests will register a false positive. That's what "95% confidence" means. It does not mean there is "too much noise."

Moreover, in a proper A/B test, the A group and B group need to be independent and identically distributed. So, in an A/A/B test, if the A/A disagree it shouldn't tell you anything about B. That's what "independent" means.

If you want to be more confident you just increase your alpha. alpha=0.05 is already too high for most consumer web apps anyhow, IMO, but go wild. 99% confidence! Woo!

As a rule you want higher confidence when the cost of a mistake is high, e.g., this medicine gives people brain tumors! Oops.

1 comments

Perhaps you could view this "A/A/B" test as a very crude form of http://en.wikipedia.org/wiki/Bootstrapping_(statistics) method? At least if you're resampling A1 and A2 from a pool A and then doing separate A1/B and A2/B tests and looking at how much the resulting statistic varies between the two runs.

Agreed this is a silly way to go about it, but there better-thought-out bootstrapped confidence tests which could be used if you don't fully trust the distributional assumptions behind (say) the t-test.

I wish! The words empirical distribution are music to my ears.

No, usually the rule people use is this: "If A1 and A2 show a statistically significant difference, then do not reject the null hypothesis regardless of A1/B or A2/B."