| HN Mirror

You know, after disbelieving what I said about A/B testing only to find out that I was right, you might have adjusted your expectations of me. I've written formal proofs before. I have published papers in mathematics. I know what's involved. Please spare me the lecture.

In this case none of the three possibilities that you stated are correct. I have an algorithm about which the following statement is true:

Suppose that users arrive according to a Poisson process. Suppose further that the random reward function of putting them in to different versions is variable but satisfies the following properties:

1. There is a static upper bound on the expected value of each version.

2. There is a static upper bound on the variance of each version.

3. The cumulative distribution of the reward function is integrable. (We don't even need continuity!)

4. One of the versions at all times has an expected return that is at least epsilon greater than all other versions

Then I have a MAB algorithm which achieves logarithmic regret in the long run. It is only worse by a constant factor than MAB algorithms for the case with fixed returns. That factor is likely to be somewhere in the neighborhood of sqrt(2), but I can't guarantee that.

Those are limit statements, but you can get more concrete bounds in the finite case. In principle it should be possible to derive an explicit formula - you tell me the bounds on expected value and variance for the versions, and the size of epsilon, and I can put an explicit bound on the odds that the best version is winning after N observations.

The principle behind the proof is exactly what it was for the A/B test case. The assumption of Poisson processes allows us to subselect equal samples from the versions for the purposes of statistical inference, and make provable statistical statements about how the behavior of the statistical sample allows us to make inferences about which version is best, even though actual conversion behavior is constantly changing.