| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by PanoptesYC 260 days ago
	Yes. The paper explains the basic model as so: "We consider the basic model with IID rewards, called stochastic bandits. An algorithm has K possible actions to choose from, a.k.a. arms, and there are T rounds, for some known K and T . In each round, the algorithm chooses an arm and collects a reward for this arm. The algorithm’s goal is to maximize its total reward over the T rounds."