| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jfarmer 4599 days ago

There's no such thing as a "small" or "large" sample size, per se. If you're doing it rigorously, you need to fix both your confidence interval (e.g., 95%) and the effect size you expect to see (e.g., a 50% lift in metric X relative to your control). You can then do some simple math which will tell you what sample size you need before there's only a 5% chance you'll see a 50% lift in metric X if you continue the test. Finally, you run the test until you've sampled that many users and stop the test. If there's a winning variant and it's statistically significant, congrats! If not, go back to square one.

The larger the effect size, the smaller your sample size can be before you reach that conclusion.

Most folks don't fix the desired effect size and instead just create a bunch of variants, start the A/B test, wait for the A/B testing framework to shout "statistically significant!", and then declare a winning variant. If the sample size seems "too small" they might not feel comfortable declaring a winner, so they perfunctorily "get a few more samples." Neither of these are rigorous, so it's a bit pointless to debate about which one is "better."