|
|
|
|
|
by yichijin
2894 days ago
|
|
Hey all, statistician from Optimizely chiming in here. Just wanted to point out that this is exactly the right point. I wanted to add one detail--there actually are ways to do early stopping while staying within a frequentist approach. For example, most clinical trials methods are not Bayesian but rather are just fixed-horizon tests that have the allowable amount of Type 1 error "spread out" amongst the multiple looks that are planned in advance. At Optimizely we essentially have a continuous version of this that does in fact allow for multiple looks with rigorous control of Type 1 error. As tedsanders mentions, the key upside is that if you start an experiment with a larger-than-expected lift, you can terminate it early. Then over many repeated experiments, you gain a lot in terms of average time to significance. The dissonance in this discussion mostly stems from the fact that this paper (which we actually collaborated on!) uses data from 2014, before we rolled out this new Stats Engine. For more, I would encourage a look at our paper: http://www.kdd.org/kdd2017/papers/view/peeking-at-ab-tests-w... |
|
In fact, why use an inferential framework at all (estimating some sort of probability and using it to guide action), rather than directly using a policy learning framework, e.g. modeling this as Q-learning or multi-armed bandit problem?
If at the end of the day you have some objective function (e.g. 'making money'), some known space of actions (e.g. move this widget up the page, change the color, engage with user this way), and a reasonable way to associate those two, then isn't the company literally doing reinforcement learning over time?
It seems one benefit of a reinforcement learning framework is it maintains a set of actions that will still be explored in the future without forcing you to prematurely 'choose' whether A or B is actually better—if A is better in reality, then it will be explored more and more often and B will progressively become downweighted over time.