|
|
|
|
|
by umanwizard
4531 days ago
|
|
Neat project. A few concerns I have: 1) ideally you would be able to measure change in every metric, not just ones you whitelist for a specific experiment. What if adding one feature changes how people interact with a completely different feature? You would want to know about this. 2) just showing change without any sort of hypothesis testing is just begging for people to draw unfounded conclusions from the results. Instead of a vague note that more than 100 sessions is necessary to get significance, you need to have real confidence intervals at the very least. |
|
The author could have implemented a simple Chi-square test and gotten CIs. The problem is that conversion rates are usually < 6% and that means you'd have to have a MASSIVE sample size to detect a difference.
Our basically Type II error is much more important than typical statistical applications. Our statistical power is super important.
The author could implement Bayesian statistics with a Beta distribution prior initialized with alpha = 3, beta = 100 (mimicking a 3% conversion rate). The results would be robust to this prior information. The problem is that there is no closed-forum likelihood solution. This means you need to use Markov Chain Monte Carlo simulation. Web servers don't like that.
In my experience, if you see a nice 10% boost in conversion rate (conv. b / conv. a) after some representative period of time like a few days, you should just go with that result.
In that way, you don't ignore what's smacking you in the head. "The implementation had a higher conv. rate or not over a few days." Detecting small differences really well with stats is fairly pointless in this space.