|
|
|
|
|
by bjlorenzen
4402 days ago
|
|
Using four buckets instead of two like that will improve your confidence in the results, but will also double the required sample / testing duration. You could just as easily use two buckets and wait twice as long to achieve the same effect. |
|
Microsoft Research suggested (http://ai.stanford.edu/~ronnyk/2009controlledExperimentsOnTh...) that you continuously run A/A tests alongside your experiments. An A/A test can:
- Collect data and assess its variability for power calculations
- test the experimentation system (the Null hypothesis should be rejected about 5% of the time when a 95% confidence level is used)
- tell if users are split according to the planned percentages