Hacker News new | ask | show | jobs
by picomancer 2961 days ago
It seems like any method for solving this problem could be interpreted in a Bayesian way: At any time, you consider all the different possible distributions each arm could have, and assign each a probability which is how likely you think that distribution is to occur.

The probabilities are initialized to some value (the "prior"), then when you pull the arm, you get some new information, which you use to update the probabilities based on evidence.

It would be interesting to try to see if you could analytically solve this problem for a simple family of distributions. For example, assume each lever produces Gaussian results, but has an unknown mean and SD. Set the prior to be that the means are normally distributed with mean 0 and SD 1, and the SD's are exponentially distributed with mean 1.

2 comments

Your intuition is spot on. With this line of argument you will end up with Thompson Sampling [0]. TS is an old algorithm from the 1930s. It is disconcertingly effective, even when the underlying Bayesian assumptions are not correct. It is also extremely hard to analyze. This led to a weird state of affairs - TS gave excellent empirical performance but we could not make any theoretical statements about its efficacy. It is only very recently that this has changed.

[0] https://en.wikipedia.org/wiki/Thompson_sampling

I think that's how you derive UCB, but optimizing cumulative regret rather than finding the probability distribution directly. Pls correct me if I'm wrong