Hacker News new | ask | show | jobs
by feral 4539 days ago
(Synference cofounder here)

It has successfully increased Wikipedia's donation revenue; however, some fundamental assumptions baked into the AB-testing approach are all wrong; this leaves a lot of money on the table.

Instead of trying to find a single best version for everyone, people should realise that different segments of their users are going to have different preferences. Machine learning can find these in an automated way, and there's a lot of value to be gained - but people need to see past the standard 'one size fits all' AB-testing approach to take advantage of it.

3 comments

There's a huge difference between "all wrong" and "has some fundamental assumptions that could lead to non-optimal results if you wanted to push the results even further". The former implies, upon initial reading of the headline, that whatever Wikipedia is doing doesn't even give you a better solution.

A better headline might have been "Hidden assumptions in Wikipedia's A/B testing and how it could be improved". Also, why are we picking on Wikipedia here? Don't a lot of companies claim to do A/B testing and isn't this a problem inherent in all simple A/B testing?

AB-testing is much better than doing nothing.

But you have to be careful with any powerful tool, in case its success blinds you to its weaknesses. "When all you have is a hammer, everything starts to look like a nail". We think that's happening with AB-testing at the moment.

We think Wikipedia is awesome, and would love them to get more donations by using a more sophisticated approach.

Yes, most companies doing AB-testing, if they have the ability to personalise their user experience (i.e. they aren't trying to quickly find the single best UI) could benefit.

However, Wikipedia is a really good example to start with - the phrase 'Wikipedia needs those nickels' is a great example - resonates well with US donators, will probably work in Canada, but what about the UK? Australia?

It's obvious once its pointed out - but wouldn't it be better if the system automatically realises this? And considers all the combinations? That's our point.

Yeah that's fair enough, but 50% increase in revenue per impression is hardly bad. You can make the argument that it could be better, and I agree with that sentiment within the article. But one of the first things to bear in mind when optimising is to always go for the low-hanging fruit first.

Although it's just my guesstimate, I'm imagine that the broad approach to A/B testing (or just doing it at all in the first place!), will be the most important and easiest fruit to grab. Whilst a multi-armed bandit solution will probably only provide much smaller returns on top. So given a choice between a wikipedia dev concentrating on improving the overall message, as opposed to trying to improve 5 or more cohorts at once - I'd go for the easier option until it stops giving significant returns.

That's not to say it shouldn't be done eventually. Like you said, it's potential money left on the table!

I believe the word for what you were doing here is "marketrolling".