| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by noelwelsh 4539 days ago

I'm a big proponent of bandit algorithms (I'm a founder of Myna http://mynaweb.com/, something of a competitor to Synference). I think there is a bit of nuance here that is missing in the discussion.

Should Wikipedia use a contextual bandit system? Assuming the cost is equivalent, I say yes, for the following reasons:

- they have a shedload of traffic, and then some; and

- they have a limited time in which the campaign is running.

Let's address these in turn.

The regret bounds on contextual bandits are O(d sqrt(T)), where d is the dimensionality of the user profile you're using and T is time (see e.g. http://arxiv.org/abs/1209.3352). In contrast, the standard stochastic bandit has regret bounds of O(log(T)), which is much tighter. The summary is you need a lot more data to explore the bigger range of possibilities you've created. So I'm not sure contextual bandits are viable for smaller sites, but Wikipedia definitely has the traffic to make them work.

If your test is running for a limited time you can just delete the code when you're done. This has several advantages. People running A/B tests typically want to answer the question "what is best?" (To crush your enemies, obviously, but you can't usually say that) under the assumption that you'll then display the best variant forever. Contextual bandit algorithms don't answer this question; instead they minimise regret (equivalently, making you the most money) which makes more sense in the limited time setting. In fact there is no "best" with a contextual bandit -- the best variant depends on the user profile. This means you need to run the algorithm forever: good if you're collecting monthly subscriptions based on algorithm use, but bad if you're trying to manage a site with an exploding number of variants under test that you can never delete.

Ob plug: If you've found this post interesting you should consider signing up for the bandit course I'm running: http://noelwelsh.com/data/2013/08/16/free-bandit-course-draf... It will cover contextual bandits, and a whole heap of other interesting stuff.

1 comments

feral 4539 days ago

>The regret bounds on contextual bandits are O(d sqrt(T)), where d is the dimensionality of the user profile you're using and T is time (see e.g. http://arxiv.org/abs/1209.3352). In contrast, the standard stochastic bandit has regret bounds of O(log(T)), which is much tighter. The summary is you need a lot more data to explore the bigger range of possibilities you've created. So I'm not sure contextual bandits are viable for smaller sites, but Wikipedia definitely has the traffic to make them work.

Its definitely viable for smaller sites. Your argument there is about the bounds in a particular theoretical situation ("when the contexts are provided by an adaptive adversary.")

In a real world situation, your contexts are provided by the normal behaviour of your users, not an adversary. And, rather than the patterns being maximally hard to find, there'll instead be a lot of low-hanging-fruit - in terms of very obvious combinations of features that define different segments of users.

What's important is not the bound of the contextual bandit in a theoretical situation, but rather what performance it gives in 90% of business situations. In many situations, doing some personalisation to automatically find the most obvious user groups is a huge win.

And later, as the amount of data increases, the granularity of segmentation will effectively increase.

I agree with your general comments on bandits, and love what Myna is doing.