|
|
|
|
|
by picomancer
2961 days ago
|
|
It seems like any method for solving this problem could be interpreted in a Bayesian way: At any time, you consider all the different possible distributions each arm could have, and assign each a probability which is how likely you think that distribution is to occur. The probabilities are initialized to some value (the "prior"), then when you pull the arm, you get some new information, which you use to update the probabilities based on evidence. It would be interesting to try to see if you could analytically solve this problem for a simple family of distributions. For example, assume each lever produces Gaussian results, but has an unknown mean and SD. Set the prior to be that the means are normally distributed with mean 0 and SD 1, and the SD's are exponentially distributed with mean 1. |
|
[0] https://en.wikipedia.org/wiki/Thompson_sampling