Some people actually suggest running A/A/B tests just to gauge how much noise is in their numbers, though that requires even more visitors to achieve statistical confidence since they're spread out among more options.
I've worked at companies that tried to do this before. It makes no sense and shows the people running the A/B tests don't really understand the statistics behind A/B testing.
If I'm running an A/A test at 95% confidence and a sufficient number of visitors for whatever effect size I'm interested in, then 1 in 20 A/A tests will register a false positive. That's what "95% confidence" means. It does not mean there is "too much noise."
Moreover, in a proper A/B test, the A group and B group need to be independent and identically distributed. So, in an A/A/B test, if the A/A disagree it shouldn't tell you anything about B. That's what "independent" means.
If you want to be more confident you just increase your alpha. alpha=0.05 is already too high for most consumer web apps anyhow, IMO, but go wild. 99% confidence! Woo!
As a rule you want higher confidence when the cost of a mistake is high, e.g., this medicine gives people brain tumors! Oops.
Perhaps you could view this "A/A/B" test as a very crude form of http://en.wikipedia.org/wiki/Bootstrapping_(statistics) method? At least if you're resampling A1 and A2 from a pool A and then doing separate A1/B and A2/B tests and looking at how much the resulting statistic varies between the two runs.
Agreed this is a silly way to go about it, but there better-thought-out bootstrapped confidence tests which could be used if you don't fully trust the distributional assumptions behind (say) the t-test.
I wish! The words empirical distribution are music to my ears.
No, usually the rule people use is this: "If A1 and A2 show a statistically significant difference, then do not reject the null hypothesis regardless of A1/B or A2/B."
I do this when I'm not confident I set up the experiment correctly. If the A's differ bit quite a bit, it's more likely I made a mistake than there's is normal statistical variance. I make mistakes daily.
I think that's pretty sensible reason for A/A/B testing. Or A/B/B testing. Whatever you like.
If I'm running an A/A test at 95% confidence and a sufficient number of visitors for whatever effect size I'm interested in, then 1 in 20 A/A tests will register a false positive. That's what "95% confidence" means. It does not mean there is "too much noise."
Moreover, in a proper A/B test, the A group and B group need to be independent and identically distributed. So, in an A/A/B test, if the A/A disagree it shouldn't tell you anything about B. That's what "independent" means.
If you want to be more confident you just increase your alpha. alpha=0.05 is already too high for most consumer web apps anyhow, IMO, but go wild. 99% confidence! Woo!
As a rule you want higher confidence when the cost of a mistake is high, e.g., this medicine gives people brain tumors! Oops.