Hacker News new | ask | show | jobs
by cmansley 5018 days ago
Ok, you are making a point about sampling. In periods of high traffic you have more samples that will bias the calculation. But, since the number of samples will be consistent between A and B, everything is fine. Fine.

I believe further discussion will not be productive. But, I suggest if you have a proof about A/B testing as it relates to the MAB problem or even if you have a proof about A/B testing in general that drops the normally distributed assumption or tells you exactly when to switch from exploring to exploiting, I suggest you write it up and publish it on arXv.

Thanks for your time.

1 comments

Yes, I was indeed making a point about sampling. Hopefully we're on the same page now.

But here is an honest question for you. Why do you think that it would make sense for me to try to write up and publish a paper on arXv?

My view is that doing so would take a considerable amount of work. And the real-world constraints that matter to my clients don't seem to be in a direction that academics care much about, so I can't see them getting particularly interested in it. So it does not seem like it positively impacts my life.

I say this as someone who has several publications to my name. This fact has only once mattered to me. That once was when I needed to get sign-off from my current employer to have something I did before they hired me get published. Unfortunately my employer at the moment was eBay, they had just purchased Skype, and the paper that I was publishing among other things implied that Skype was unlikely to be worth what eBay had paid for it. This was..not fun.

If some research mathematician particularly wanted to sit down, pick my brains, and try to formalize the real-world constraints I've observed in my clients, that would be fine by me. It would only be fair in that case for me to be listed as a co-author. But unless that happens, I'm not going to try to publish a formal paper.

I was encouraging you to write down your mathematical evidence of better performing algorithms, because I know from my experience that when I try to write down proofs, my assumptions and reasoning become clearer. And often a step in the proof or logic that seems trivial becomes less trivial once you actually try to lay out the proof.

I believe that one of two things will arise when you go to write up a mathematical proof. One, you will be dependent on a statistical test that makes a normal distribution assumption implicitly or explicitly. Two, you will dependent on a bound that only applies in the limit. There is a third possibility, which is the most interesting to me, which is that you are dependent on a period of stationarity in your data, in other words, the distribution you are measuring must be stationary.

The proofs in the MAB papers are not intentionally obtuse or ignoring mainstream statistical ideas. They are written that way to say very explicit things under well defined assumptions. Locking down assumptions and being very precise is what writing down a mathematical proof is all about.

You know, after disbelieving what I said about A/B testing only to find out that I was right, you might have adjusted your expectations of me. I've written formal proofs before. I have published papers in mathematics. I know what's involved. Please spare me the lecture.

In this case none of the three possibilities that you stated are correct. I have an algorithm about which the following statement is true:

Suppose that users arrive according to a Poisson process. Suppose further that the random reward function of putting them in to different versions is variable but satisfies the following properties:

1. There is a static upper bound on the expected value of each version.

2. There is a static upper bound on the variance of each version.

3. The cumulative distribution of the reward function is integrable. (We don't even need continuity!)

4. One of the versions at all times has an expected return that is at least epsilon greater than all other versions

Then I have a MAB algorithm which achieves logarithmic regret in the long run. It is only worse by a constant factor than MAB algorithms for the case with fixed returns. That factor is likely to be somewhere in the neighborhood of sqrt(2), but I can't guarantee that.

Those are limit statements, but you can get more concrete bounds in the finite case. In principle it should be possible to derive an explicit formula - you tell me the bounds on expected value and variance for the versions, and the size of epsilon, and I can put an explicit bound on the odds that the best version is winning after N observations.

The principle behind the proof is exactly what it was for the A/B test case. The assumption of Poisson processes allows us to subselect equal samples from the versions for the purposes of statistical inference, and make provable statistical statements about how the behavior of the statistical sample allows us to make inferences about which version is best, even though actual conversion behavior is constantly changing.