Hacker News new | ask | show | jobs
by feral 4539 days ago
>The regret bounds on contextual bandits are O(d sqrt(T)), where d is the dimensionality of the user profile you're using and T is time (see e.g. http://arxiv.org/abs/1209.3352). In contrast, the standard stochastic bandit has regret bounds of O(log(T)), which is much tighter. The summary is you need a lot more data to explore the bigger range of possibilities you've created. So I'm not sure contextual bandits are viable for smaller sites, but Wikipedia definitely has the traffic to make them work.

Its definitely viable for smaller sites. Your argument there is about the bounds in a particular theoretical situation ("when the contexts are provided by an adaptive adversary.")

In a real world situation, your contexts are provided by the normal behaviour of your users, not an adversary. And, rather than the patterns being maximally hard to find, there'll instead be a lot of low-hanging-fruit - in terms of very obvious combinations of features that define different segments of users.

What's important is not the bound of the contextual bandit in a theoretical situation, but rather what performance it gives in 90% of business situations. In many situations, doing some personalisation to automatically find the most obvious user groups is a huge win.

And later, as the amount of data increases, the granularity of segmentation will effectively increase.

I agree with your general comments on bandits, and love what Myna is doing.