Hacker News new | ask | show | jobs
by noelwelsh 5127 days ago
I don't think the non-stochastic setting is entirely appropriate. What we've been discussing is a situation where the differences in conversion rates are constant in expectation, though absolute conversion rates vary. So it's basically a stochastic setting, which should allow more leverage. Also, as I recall for the non-stochastic setting the definition of regret changes so the regret bounds are not directly comparable.
1 comments

My take on btilly's problem was that the difference between the convergence rates is not constant but rather known to be bounded away from zero by some unknown constant. That is, there is some value epsilon such that the difference in conversion rate is on every round at least epsilon. However, at some rounds it could be more than epsilon, and it doesn't necessarily vary according to any fixed distribution. I think this is therefore roughly equivalent to the non stochastic setting additionally assuming the best arm has large (relative) reward.

I agree though that if the difference in conversion rate were stochastic that could potentially make the problem much easier.