Hacker News new | ask | show | jobs
by sokoloff 674 days ago
I recall getting into a heated debate with an analyst at my company over the topic of "peeking" (he was right; I was wrong, but it took me several days to finally understand what he was saying.)

The temptation to "peek" and keep on peeking until the test confesses to the thing you want it to say is very high.

1 comments

This is the most "damned if you do, damned if you don't" part of testing. I've found so many coding errors that weren't obvious until you looked at the day 2 or day 3 test results. "Hm, that's weird. Why is $thing happening in this test? It shouldn't even touch that component."

If you peek, you really have to commit to running the test for the full duration no matter what.

No you don't. If your protocol involves peeking (and early stopping), you need different thresholds to declare statistical significance. But you can do that. You just need to know whether you're peeking or not, which everybody does.
> If you peek, you really have to commit to running the test for the full duration no matter what.

It's more complicated, but you can also run sequential A/B testings using [SPRT](https://en.wikipedia.org/wiki/Sequential_probability_ratio_t...) or similar, where a test gets accepted or rejected once it hits a threshold. I won't go into the details, but you can incrementally calculate the test statistic, so if your test is performing very badly or well, the test will end early.

One product team I worked in run all tests as sequential tests. If you build a framework around this, I'd argue it's easier for statistics-unaware stakeholders to understand when you _can_ end a test early.

If there is a bug, then the experiment needs to be called off and a new one constructed. You shouldn't change anything else during the execution of the experiment.