Hacker News new | ask | show | jobs
by Silhouette 3235 days ago
I think you and the other guy want that single conversion to be evidence, but in reality, it's statistical noise.

It is evidence, just like any other properly collected data point. It's just very weak evidence, is what we're saying.

Of course in real world situations there may be a lot of variance and the correct answer may well turn out to be the other one. But in the absence of additional information, that is true for literally any number of samples that is less than whatever proportion of the population would give you absolute proof that your chosen answer is correct. If you have 50%-1 samples and every single one went with option A, you're still wrong if the other 50%+1 would have gone for option B.

What you're calling "noise" is an ill-defined concept. Qualitatively there is no difference for a result in a two-way test between a single sample and 50%-1. You still don't know for sure which answer is the right one. However, you're going to be much more confident about having the right answer in the latter case, which is what I think closed was trying to explain to you.

Again, it's not about going in with an assumption of which is better, it's about realizing that in split testing the biggest challenge is disproving the null hypothesis.

But if you're running a test with null and alternative hypotheses, you are going in with an a priori preference for one outcome over the other. You are literally saying that if the result is close enough, you will prefer not to reject the null hypothesis, and therefore whichever variation you have arbitrarily chosen to be your null hypothesis will be the answer.

That is self-evidently not a neutral assessment of option A vs. option B, and therefore there will be some cases where your test is more likely than not to make the wrong decision. In short, you are using an inappropriate test for the situation that closed was describing.

1 comments

Alright, last comment from my side, just to clarify:

>> You are literally saying that if the result is close enough, you will prefer not to reject the null hypothesis, and therefore whichever variation you have arbitrarily chosen to be your null hypothesis will be the answer.

This is a misunderstanding. The null hypothesis is that your two variants have no statistical impact on conversion and any edge you see is just random. That is the hurdle you have to overcome to gain any useful direction from B testing.

GL!

Fair enough, my phrasing before was a little casual, but the underlying point is sound. A hypothesis test might tell you that there is no significant impact on conversion at your chosen level. However, you still have to make a choice between option A and option B. If you have no a priori reason to favour one as the default and no additional data to consider -- which, again, is a crucial detail in the situation closed was talking about -- you should logically still choose whichever option that was most successful during your experiment. This is simply because if your conclusion was correct and there is no impact on conversion then which you pick doesn't matter, but if your result was a false negative then it is more likely that the more successful option during the experiment is the better choice. Given that you're going to pick that one anyway, your hypothesis test hasn't actually provided any useful information to help inform your decision in this scenario.

In any case, we seem to be talking at cross-purposes here, so perhaps we'll have to agree to disagree on this one.