Hacker News new | ask | show | jobs
by elehack 2577 days ago
Yes. Bandits will often converge more quickly to the optimal strategy, but it is much more difficult to understand why that strategy is optimal and generalize from the bandit outcomes to predict future performance and performance of other strategies.

It isn't impossible - bandits are seeing adoption in medical trials to avoid precisely the problem discussed - but the standard experiment design and analysis techniques you learn in a decent college statistics class or introductory statistics text no longer apply. That's one of the beauties of A/B testing: while it does require substantial thought to do well, the basic statistics of the setup are very well-understood at this point.

2 comments

But for results to generalize or to understand why, the confounders must be accounted for in the randomization. This is really hard to do well -- there are often subtle influences that aren't sufficiently understood how they impact these non-linear systems. What makes someone convert? A million different factors; changing the color of a button in one context doesn't necessarily tell me much about how people would respond to that experience in another context.

It's easy to underestimate how complex things are, because we only see some superficial aspects of e.g. a user/software interaction model. This flaw is down to how our brains work -- ref "What you see is all there is".

I disagree. I’ve spent a lot of time staring at bandit outcomes and usually they match some sort of intuition of why a variant might be exceptional.
That could be post-hoc reasoning, though. It would be interesting to pre-register your hypotheses, or see whether you could tell bandit outcomes from random ones.
Sure it’s post-hoc reasoning, but it doesn’t matter because I’m not trying to invalidate a hypothesis.

I’m looking for variants that win. When I find one that wins I look at it and try to add more of the same flavor to the product.

This process works.

This is literally the logical fallacy. You could get lucky. Maybe you have obvious gains to chase. But bad logical arguments are bad because they never work forever. They are corrupted heuristics that can get you in trouble without critical thinking.

Edit: added in forever. Phone dropped some wording I originally had. I think.

Call it a genetic algorithm if you like. I’m looking for incremental wins in a world of infinite possibilities, not truth.
Incremental wins can still lead to dead ends. My phrasing was off in my post. I meant to say that the fallacies aren't that the tactics never work. Just that they can stop working without you really realizing it. A heuristics that can lead you down a dead end.

By all means, keep doing it if it is working for you. But don't confuse it as good advice. And stay vigilant.

Isn't this problem also an issue when people talk about transferring what they learn from one test to another test? That is frequently cited as a benefit of A/B testing.