| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sokoloff 674 days ago
	I recall getting into a heated debate with an analyst at my company over the topic of "peeking" (he was right; I was wrong, but it took me several days to finally understand what he was saying.) The temptation to "peek" and keep on peeking until the test confesses to the thing you want it to say is very high.

1 comments

jakevoytko 674 days ago

This is the most "damned if you do, damned if you don't" part of testing. I've found so many coding errors that weren't obvious until you looked at the day 2 or day 3 test results. "Hm, that's weird. Why is $thing happening in this test? It shouldn't even touch that component."

If you peek, you really have to commit to running the test for the full duration no matter what.

link

thaumasiotes 674 days ago

No you don't. If your protocol involves peeking (and early stopping), you need different thresholds to declare statistical significance. But you can do that. You just need to know whether you're peeking or not, which everybody does.

link

beejiu 674 days ago

> If you peek, you really have to commit to running the test for the full duration no matter what.

It's more complicated, but you can also run sequential A/B testings using [SPRT](https://en.wikipedia.org/wiki/Sequential_probability_ratio_t...) or similar, where a test gets accepted or rejected once it hits a threshold. I won't go into the details, but you can incrementally calculate the test statistic, so if your test is performing very badly or well, the test will end early.

One product team I worked in run all tests as sequential tests. If you build a framework around this, I'd argue it's easier for statistics-unaware stakeholders to understand when you _can_ end a test early.

link

aflag 674 days ago

If there is a bug, then the experiment needs to be called off and a new one constructed. You shouldn't change anything else during the execution of the experiment.

link