Hacker News new | ask | show | jobs
by yummyfajitas 3853 days ago
So treat it as an identity problem, throw away all of the users you find questionable,...

And if questionability is correlated with the thing you are trying to measure, you've just added bias. For example, consider trying to measure engagement or something correlated with it. Are users who connect to your site from 3 different devices more or less engaged than normal? Great - you just threw out your most engaged users.

Similarly, you can't just use a session cookie to test per-session behavior. This introduces correlations between sessions, which violates the IID assumption in all the standard statistical tests.

https://www.chrisstucchio.com/blog/2015/no_free_samples.html

You can fix this if you want by using the weakly mixing central limit theorem or just explicitly putting the mixing into a Bayesian analysis. But that's probably a lot trickier than just using a long term cookie.

1 comments

You have to know the limitations of the approach you are using.

Also about session cookies, there is no correlation created if the A/B test behavior is tied to the session. The downside is that different users get different behaviors on different days. This may be a bad user experience. The upside is that it is quick and simple for things like landing pages.

In the end there is no solution that avoids actually understanding what your data really says.

There is absolutely correlation between sessions. If visitor 1 (corresponding to sessions 1,2,3) has a high conversion probability, while visitor 2 (corresponding to sessions 4,5,6) has a low conversion probability, then you've introduced correlation between sessions 1,2,3 and sessions 4,5,6. This breaks the CLT and all the usual independence assumptions.

If most of your visitors only have one session this may not matter...but then again with only session cookies you don't even have a way to know this.