|
|
|
|
|
by equark
5388 days ago
|
|
Somebody really needs to write a Bayesian takedown of all these A/B testing articles. A/B testing is a Bayesian decision problem. There's really no other way to think about it. Determining sample size and frequentist confidence intervals are only relevant insofar as they approximate Bayesian concepts. The issue is the proper tradeoff between exploration and exploitation. What drives the decision is outstanding uncertainty conditional on the data observed (not conditional on the null hypothesis of zero effect and some non-sequential iid sampling process), the discount rate (which is totally absent in this article), and the reward structure (which is not a Type I and Type II error). The absurdity of the frequentist approach is clear from the admonition not to look at the results of the tests too often. |
|
I'm sure you're aware of this, but I'm just trying to clarify the idea for other readers.
The idea is not well-illustrated in the article. (Although the article does provide some usable guidance until the whole Bayesian framework gets built and populated with correct parameters, like the reward structure.)
So, to be concrete -- Suppose you're flipping coins and you figure (by some procedure) you need 100 flips to reach significance. By the 70th flip, you observe that p(head) ~= 40/70 ~= 57%, so you decide to stop the test because clearly you're not dealing with a 50/50 coin. That's not OK, because you'll always see favorable and unfavorable excursions in a series of coin flips -- if you choose to stop in the middle of such an excursion, you'll bias the result. You've made the stopping time dependent on the observed values.
In some situations you can do this (it's related to http://en.wikipedia.org/wiki/Optional_stopping_theorem), but the way that I described above is not one of them.