| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sbov 4075 days ago

I'm not a statistician, but lately I've been wondering:

When we're A/B testing code, the code is already written. If there's a 5%, or even 15% chance of it being bullshit, who cares? The effort is usually exactly the same if I switch or not.

It's my understanding that 95%, 99%, etc, were established for things that require extra change. We don't want to spend extra time developing and marketing a new drug if it isn't effective. We don't want to tell people to do A instead of B if we aren't sure A is really better than B.

But in software I've already spent all the time I need to to implement the variation on the feature. So given that, why do I need 95%?

I would appreciate if someone with more knowledge can answer this question.

Edit to add: I see a lot of answers about the cost to keep the code around. What about A/B tests that don't require extra code, just different code? Most of our A/B tests fall into this category.

3 comments

oberstein 4075 days ago

You would be better served by a Bayesian approach to A/B testing, and measure directly the probability of A converting more than B. http://www.evanmiller.org/bayesian-ab-testing.html You can then apply some sort of decision rule such as the difference being above a certain threshold.

link

dgant 4075 days ago

Keeping the code around has a cost: the cost of maintaining, reading, compiling, deploying, and building around it. If the new feature isn't adding value, all it's doing is adding to the cost of real value-features you want to develop later.

link

jy133 4075 days ago

Would you push a feature that negative affected your product? 95% confidence you will be able to know if you're feature is indeed positive, negative, or roughly neutral.

link

andreasklinger 4075 days ago

I think the core question is:

validation of upside vs validation of downside

as in: i want to avoid pushing something that is worse but i am optimistic (up to even indifferent) about how much something is better

personal opinion: data trains gut-feeling

link