| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ReadingInBed 3727 days ago
	I thought this was a pretty good follow up to show the strengths and weaknesses of this approach: https://vwo.com/blog/multi-armed-bandit-algorithm/. Personally I think this approach makes a lot more sense than a/b testing especially when often people hand off the methodology to a 3rd party without knowing exactly how they work.

2 comments

aidanf 3727 days ago

Here are 2 good articles that follow up on the arguments presented by VWO in that article.

From the first link below: "They do make a compelling case that A/B testing is superior to one particular not very good bandit algorithm, because that particular algorithm does not take into account statistical significance.

However, there are bandit algorithms that account for statistical significance."

* https://www.chrisstucchio.com/blog/2012/bandit_algorithms_vs...

* https://www.chrisstucchio.com/blog/2015/dont_use_bandits.htm...

link

paraschopra 3727 days ago

Chris is now VWO's director of data science. We recently overhauled our stats. Here's a quick summary for that: https://vwo.com/blog/smartstats-testing-for-truth/

link

thenomad 3727 days ago

Interesting. Lack of Bayesian testing is one of the reasons I've never considered using VWO - nice to see that's now your main methodology.

link

mkesper 3727 days ago

Changing you users UI always bears some cost, though, so I'm not sure it's really the smart choice.

link

raverbashing 3727 days ago

The points raised are valid, if they matter is a different beast

Even in the tests shown, conversion rate was higher for the MABA algorithms than simple A/B testing. "Oh but you get higher statistical significance!" thanks, but that doesn't pay my bills, conversion pays.

link

tzs 3727 days ago

Careful. It wasn't always higher for MAB even though the tables shown there make it appear so at first.

Those tables are showing the conversion rate during the test, up to the time when statistical significance is achieved. You generally then stop the test and go with the winning option for all your traffic.

In the two-way test where the two paths have real conversion rates of 10% and 20%, all of the MAB variations did win. Here is how many conversions there would be after 10000 visitors for that test, and how they compare to the A/B test:

  RAND   1988
  MAB-10 1997  +9
  MAB-24 2001 +13
  MAB-50 1996  +8
  MAB-90 1994  +6

For the three-way test where the three paths have real rates of 10%, 15%, and 20%, here is how many conversions there would be after 10000 visitors:

  RAND   1987
  MAB-10 1969 -18
  MAB-50 1987  +0
  MAB-77 1988  +1

Note that MAB-10 loses compared to RAND this time.

(The third column in the above two tables remains the same if you change 10000 to something else, as long as that something else. MAB-10 beats RAND in the first test by 9 conversions, and loses by 18 conversions in the second test).

link

hythloday 3727 days ago

> up to the time when statistical significance is achieved. You generally then stop the test

Just a note, don't literally do this:

http://conversionxl.com/statistical-significance-does-not-eq...

link

disgruntledphd2 3727 days ago

Just to reiterate, this violates the assumptions under which you get your p-values.

I want an A/B testing tool that won't let you see results until it's done.

link

raverbashing 3727 days ago

Interesting

This suggest to me that, similarly to a lot of algorithms you might want to change your parameters during training

So start with MAB-100 (RAND) and then decrease that % over time

link

kriro 3727 days ago

The counterargument would be: in the long run, random positive changes in clicks don't pay the bills. Systemic changes do.

link

raverbashing 3727 days ago

Yes and you don't get systemic change with A/B testing

I find it funny when some people think A/B (or MAB) testing will solve major usability problems

You can definitely try Genetic Programming your website to conversion, it's probably going to be fun to watch

link