Hacker News new | ask | show | jobs
by EvanMiller 4539 days ago
I won't comment on the virtues of bandit algorithms, but the author grossly mischaracterizes the requirements of a multivariate A/B test. Just do a regular 50/50 split and run a regression on the outcome using the observable characteristics as explanatory variables. Then take that model and feed the observables back in to get an individualized prediction as to whether A or B will perform better. You can bin continuous variables or you can including higher-order terms to achieve a polynomial fit. The data requirements really aren't that large.

I believe the author's confusion stems from attempting to model every interaction term, e.g. if you think there will be some extra effect to being Male and using Windows and being Swedish that is more than the "Male effect" and "Windows effect" and the "Sweden effect" considered individually.

I wrote a little article on implementing a simple multivariate A/B test from scratch:

http://www.evanmiller.org/linear-regression-for-fun-and-prof...

All you really need is a function that inverts a matrix and you're good to go. The cool part is that you can actually throw away most of the data that comes in from the firehose, just keep updating the Hessian until you're ready to invert it and bam, you have a model.

2 comments

(Synference cofounder here)

That's a good blog post.

There are many ways to handle multi-variate data, with different tradeoffs. The linear model you describe in your post is one of the classic machine learning models often used. In the post, we address the shortcoming of such models in the subsection "Trading off exploration vs. Exploitation."

As we point out, such a model ignores the trade-off between exploration and exploitation. That means that you have to gather all your data by randomly serving variants, before you can figure out the right variant to serve to each user.

That's very expensive in terms of serving-the-wrong-variant, if what you are trying to predict is conversion or revenue. By contrast, our method will quickly realise if a particular variant has a very high expected value for the current user (with the current user's particular combination of features), and so serve that user that variant - even if we aren't confident about other users.

So, like any bandit algorithm, we trade off exploration and exploitation; but further, we trade this off not just across the population as a whole, but on a personalised basis. That's a very important functionality that the approach you describe doesn't have.

Why is this an issue? Because if you have to gather data by randomly sampling, what if a certain sort of user doesn't appear very often? (e.g. lets say for a particular site, that's a 65 year old female from Canada). Do you randomly sample until you see a user like this? If so, you are randomly sampling for a long time. Or do you stop your exploration before you see rare users? This totally neglects your rarer users. What if rare users have unusual preferences, or are particularly valuable? (E.g. maybe your rare users are your biggest spenders - which happens.)

And, if you have high dimensional data, all your users are effectively rare (because you rarely see users with that exact combination of features).

Furthermore, the 'gather data randomly' approach is also impractical if user behaviour is dynamic, as it often is in real-world settings. (i.e. the preference of your user base as a whole changes - maybe they've all recently read an article, so a different marketing copy now resonates with them)

Finally, sometimes those interactions you mention, between different features, are really important, and shouldn't be neglected.

I could go on - there's a lot of subtle issues that the approach we are using solves that simple multivariate testing doesn't handle, basically.

What we're doing is trying to provide an API that wraps all this knowledge up into a service that's easy for people to use.

Your blog post makes an important mistake in that it recommends actually inverting the matrix X^T X, which one never actually does in this case. In general, A^{-1} b is sort of mathematical "slang" for "a solution to the linear system Ax = b", not "compute A^{-1} and multiply b by it".

This is discussed in many places, but here are a few examples:

http://www.johndcook.com/blog/2010/01/19/dont-invert-that-ma... http://en.wikipedia.org/wiki/Linear_least_squares_(mathemati...