| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by xyhopguy 2964 days ago
	I think that's how you derive UCB, but optimizing cumulative regret rather than finding the probability distribution directly. Pls correct me if I'm wrong