Hacker News new | ask | show | jobs
by ReadingInBed 3727 days ago
I thought this was a pretty good follow up to show the strengths and weaknesses of this approach: https://vwo.com/blog/multi-armed-bandit-algorithm/. Personally I think this approach makes a lot more sense than a/b testing especially when often people hand off the methodology to a 3rd party without knowing exactly how they work.
2 comments

Here are 2 good articles that follow up on the arguments presented by VWO in that article.

From the first link below: "They do make a compelling case that A/B testing is superior to one particular not very good bandit algorithm, because that particular algorithm does not take into account statistical significance.

However, there are bandit algorithms that account for statistical significance."

* https://www.chrisstucchio.com/blog/2012/bandit_algorithms_vs...

* https://www.chrisstucchio.com/blog/2015/dont_use_bandits.htm...

Chris is now VWO's director of data science. We recently overhauled our stats. Here's a quick summary for that: https://vwo.com/blog/smartstats-testing-for-truth/
Interesting. Lack of Bayesian testing is one of the reasons I've never considered using VWO - nice to see that's now your main methodology.
Changing you users UI always bears some cost, though, so I'm not sure it's really the smart choice.
The points raised are valid, if they matter is a different beast

Even in the tests shown, conversion rate was higher for the MABA algorithms than simple A/B testing. "Oh but you get higher statistical significance!" thanks, but that doesn't pay my bills, conversion pays.

Careful. It wasn't always higher for MAB even though the tables shown there make it appear so at first.

Those tables are showing the conversion rate during the test, up to the time when statistical significance is achieved. You generally then stop the test and go with the winning option for all your traffic.

In the two-way test where the two paths have real conversion rates of 10% and 20%, all of the MAB variations did win. Here is how many conversions there would be after 10000 visitors for that test, and how they compare to the A/B test:

  RAND   1988
  MAB-10 1997  +9
  MAB-24 2001 +13
  MAB-50 1996  +8
  MAB-90 1994  +6
For the three-way test where the three paths have real rates of 10%, 15%, and 20%, here is how many conversions there would be after 10000 visitors:

  RAND   1987
  MAB-10 1969 -18
  MAB-50 1987  +0
  MAB-77 1988  +1
Note that MAB-10 loses compared to RAND this time.

(The third column in the above two tables remains the same if you change 10000 to something else, as long as that something else. MAB-10 beats RAND in the first test by 9 conversions, and loses by 18 conversions in the second test).

> up to the time when statistical significance is achieved. You generally then stop the test

Just a note, don't literally do this:

http://conversionxl.com/statistical-significance-does-not-eq...

Just to reiterate, this violates the assumptions under which you get your p-values.

I want an A/B testing tool that won't let you see results until it's done.

Interesting

This suggest to me that, similarly to a lot of algorithms you might want to change your parameters during training

So start with MAB-100 (RAND) and then decrease that % over time

The counterargument would be: in the long run, random positive changes in clicks don't pay the bills. Systemic changes do.
Yes and you don't get systemic change with A/B testing

I find it funny when some people think A/B (or MAB) testing will solve major usability problems

You can definitely try Genetic Programming your website to conversion, it's probably going to be fun to watch