| HN Mirror

My take on btilly's problem was that the difference between the convergence rates is not constant but rather known to be bounded away from zero by some unknown constant. That is, there is some value epsilon such that the difference in conversion rate is on every round at least epsilon. However, at some rounds it could be more than epsilon, and it doesn't necessarily vary according to any fixed distribution. I think this is therefore roughly equivalent to the non stochastic setting additionally assuming the best arm has large (relative) reward.

I agree though that if the difference in conversion rate were stochastic that could potentially make the problem much easier.