Y
Hacker News
new
|
ask
|
show
|
jobs
by
xyhopguy
2964 days ago
I think that's how you derive UCB, but optimizing cumulative regret rather than finding the probability distribution directly. Pls correct me if I'm wrong