Hacker News new | ask | show | jobs
by uyt 1822 days ago
I think it's known as the "novelty effect" in the industry.
1 comments

Yup. A common way to handle this is to throw away the earliest responses, and then run your analysis. This is sometimes called “burn in”.

If the lift goes away, you know it wasn’t real.

Or compare only new users that have only ever seen either the control or the treatment.

But the effect is real, sure enough. Only to get at it you have to mix up the title/subject line every now and then.

I wouldn’t do that.

First, it seems impractical because you need a high user growth to be able to get enough samples in a reasonable time to run your experiment to statistical significance. If you are seeing high growth, how many of those accounts are real people rather than bots or duplicate accounts? Maybe that isn’t too big of a deal for an email campaign, but for apps and websites, it matters, because it of the next reason.

Secondly, there’s no reason to believe that new users behave like old users, in fact there are probably many reasons to believe that they do not. Because your A/B population isn’t representative of the actual population of users, there’s no reason to believe whatever effect you measured on the neophyte users is going to carry over to the experienced users. Maybe, but it’s not guaranteed.

Random sampling of users is better, because the A/B population is drawn from the general population, and so the only confounding property is the novelty effect. Doing what you suggest eliminates the novelty effect, but now you can’t generalize your learnings at all.