Hacker News new | ask | show | jobs
by pyronite 3237 days ago
> A/B testing landing page and newsletter ? Is that difficult to get honest feedback from people who care ?

Individual feedback doesn't replace large-scale A/B testing. If you're getting feedback from people who care (possibly implying they know you personally), it's also possible that they could deliver biased or unrepresentative feedback.

2 comments

For large scale A/B testing you need a certain amount of visitors first to get the necessary statistical significance, otherwise its feedback is unrepresentative as well. A lot of side projects probably don't have enough visitors for that.
I disagree, if a person has to make a decision under uncertainty, and a priori favors neither group A or B, then they might as well use any visitor information available to them to guide their choice.

They just shouldn't be too confident they've made the correct choice.

You are just using noise then. It's not a matter of opinion, it's statistics.
If you are waiting for N observations, so that a NHST will have some level of power, and you assume each observations is drawn from the same distribution (as your test likely does), then you do not see each observation as noise.

You will just be acting under reduced certainty, but if you have to act, any information is better than no information.

(I'd be very interested to hear your statistical explanation).

The trouble is disproving the null hypothesis. In your test, if one variant beats another, you take that as a weak signal that one may be better than the other. The data doesn't support this. Without applying a standard to your p-value, you cannot disprove the null hypothesis: that your variant is likely no better or worse.

I'm not a statistician, but I've run a lot of b-tests.

You're ignoring closed's point that "a priori favors neither group A or B".

If you are starting from a neutral position, considering two possible alternatives with neither presumed to be more favourable than the other, then any statistical test based on using one outcome as null and the other as alternative hypothesis is fundamentally inappropriate. Any such test inherently favours one outcome over the other, rather than starting from a neutral position.

As closed is trying to explain, if you really do start from neutral then even a tiny number of data points is still better than no data at all. You shouldn't have too much confidence in whether you're really making the right decision, but if you have to make a decision, you are still more likely to make the right one if you go with what the data tells you, even if it's only telling you by a very small margin.

If one variant beats another, even with very few observations, the data DOES support that one is better. It's just that you might not be very confident that one is better.

The key to understanding this situation statistically is by reframing the way you think about tests away from an all-or-nothing NHST, and toward either confidence intervals, or bayesian estimation.

That is, some kind of measure of (loosely) uncertainty around a parameter (or entire model) of interest.

The question then is:

Is the available data more useful than a coin-flip, which would be the alternative method of making a decision.

On the other hand, a coin-flip is probably the better tool. If you can't generate enough data for a statistical sample, then you're probably wasting your time creating an alternative version and setting up an A/B test.

I think people understand the theory, but do most people have enough traffic on a side project to have viable A/B tests?