|
|
|
|
|
by freedom123
4316 days ago
|
|
This is an unfortunate over simplification (for both responses). Argument towards invalid data is stating an assumption- "Most of our experiments have nothing to do with each other" < that isn't a scientific method. Its simply an opinion that you are taking no effort to gain clean data - meaning the truth. If a scenarios exist that can invalidate your findings, then you use every scientific method to avoid them not enable. "Pinterest is a pretty big website, so it's unlikely that our experiments affect each other in ways that'd make our tests invalid." << This is the very reason why you will have an attribution problem. |
|
1. You're running a ton of tests, yet I see no mention of how you're adjusting your tests to account for multiple testing. The more tests you run the higher the chance you have of getting a false positive. Couple this with the majority of things you test probably won't be significantly better, and your chance of encountering a false positive is much much higher than you might think. You're running hundreds of tests and using a p-value of 0.05, but your chances of a false positive in your tests is much much much higher than 5%. Beware multiple tests and especially beware of the base rate fallacy.
2. Your post has no mention of statistical power or the size of the effect you're looking to detect. That makes me think you might not have considered this. If you don't know the effect size you're looking for or your statistical power, your A/B test results can't be trusted -- as you have no idea what your chances are of actually detecting a beneficial result if it exists.