Hacker News new | ask | show | jobs
by taeric 1055 days ago
How well does that dodge the problem? I'd imagine a multi armed bandit should stay such that it is always sampling from many fair coins, as it were. I would be delighted to read a study on that.
1 comments

I can’t say that I did the proof out, but intuitively I would expect the posterior distribution over arm-probabilities would converge to something equal? The other option is spurious convergence to a bad posterior, which could maybe happen with poor sampling techniques, but I can’t imagine it’s more than an edge case
Right, that is what I meant about it should continue to sample from fair coins. I don't know that I've seen experiments to see how long that takes, though.

There is also the question of how long you'd leave multiple treatments out there. Presumably, even if there is no difference in outcomes, there can be benefits to having fewer deployed behaviors.

I'm now also curious if there are non-transitive situations. For example, three treatments together that all act fair if all deployed, but for reasons any two of them deployed alone will show a preference. Ideally, of course, treatments should be done such that this can't happen, but mistakes are often made.

Edit: Fully cede that this is likely chasing edges. The motivation for fewer deployed arms is far more compelling than the edge cases.