|
|
|
|
|
by PanoptesYC
260 days ago
|
|
Yes. The paper explains the basic model as so: "We consider the basic model with IID rewards, called stochastic bandits. An algorithm has K possible actions to choose from, a.k.a. arms, and there are T rounds, for some known K and T . In each round, the algorithm chooses an arm and collects a reward for this arm. The algorithm’s goal is to maximize its total reward over the T rounds." |
|