Hacker News new | ask | show | jobs
by PanoptesYC 260 days ago
Yes. The paper explains the basic model as so:

"We consider the basic model with IID rewards, called stochastic bandits. An algorithm has K possible actions to choose from, a.k.a. arms, and there are T rounds, for some known K and T . In each round, the algorithm chooses an arm and collects a reward for this arm. The algorithm’s goal is to maximize its total reward over the T rounds."