| I think you and the other guy want that single conversion to be evidence, but in reality, it's statistical noise. It is evidence, just like any other properly collected data point. It's just very weak evidence, is what we're saying. Of course in real world situations there may be a lot of variance and the correct answer may well turn out to be the other one. But in the absence of additional information, that is true for literally any number of samples that is less than whatever proportion of the population would give you absolute proof that your chosen answer is correct. If you have 50%-1 samples and every single one went with option A, you're still wrong if the other 50%+1 would have gone for option B. What you're calling "noise" is an ill-defined concept. Qualitatively there is no difference for a result in a two-way test between a single sample and 50%-1. You still don't know for sure which answer is the right one. However, you're going to be much more confident about having the right answer in the latter case, which is what I think closed was trying to explain to you. Again, it's not about going in with an assumption of which is better, it's about realizing that in split testing the biggest challenge is disproving the null hypothesis. But if you're running a test with null and alternative hypotheses, you are going in with an a priori preference for one outcome over the other. You are literally saying that if the result is close enough, you will prefer not to reject the null hypothesis, and therefore whichever variation you have arbitrarily chosen to be your null hypothesis will be the answer. That is self-evidently not a neutral assessment of option A vs. option B, and therefore there will be some cases where your test is more likely than not to make the wrong decision. In short, you are using an inappropriate test for the situation that closed was describing. |
>> You are literally saying that if the result is close enough, you will prefer not to reject the null hypothesis, and therefore whichever variation you have arbitrarily chosen to be your null hypothesis will be the answer.
This is a misunderstanding. The null hypothesis is that your two variants have no statistical impact on conversion and any edge you see is just random. That is the hurdle you have to overcome to gain any useful direction from B testing.
GL!