Hacker News new | ask | show | jobs
by bjlorenzen 4402 days ago
Using four buckets instead of two like that will improve your confidence in the results, but will also double the required sample / testing duration. You could just as easily use two buckets and wait twice as long to achieve the same effect.
2 comments

A/A testing (Null testing) or A/A/B testing gives a different effect than A/B testing.

Microsoft Research suggested (http://ai.stanford.edu/~ronnyk/2009controlledExperimentsOnTh...) that you continuously run A/A tests alongside your experiments. An A/A test can:

- Collect data and assess its variability for power calculations

- test the experimentation system (the Null hypothesis should be rejected about 5% of the time when a 95% confidence level is used)

- tell if users are split according to the planned percentages

Can you explain why? I'm struggling with the math behind the whole thing as it is, but intuitively this sounds like a very clever hack. I wonder why it would double the experiment time if effectively people are seeing either A or B variants.