|
|
|
|
|
by cschmidt
683 days ago
|
|
If you work for a large website (as I used to), they probably run hundreds of tests a week across various groups. So false positives are a real problem, and often you don't see the gain suggested by the A/B when rolling it out. I agree that Bonferroni is often too pessimistic. If you Bonferroni correct you'll usually find nothing is significant. And I take your point that you could adjust the $\alpha$. But then of course, you can make things significant or not as you like by the choice. False Discover Rate is less conservative, and I have used it successfully in the past. People have strong incentives to find significant results that can be rolled out, so you don't want that person choosing $\alpha$. They will also be peaking at the results every day of a weekly test, and wanting to roll it out if it bumps into significance. I just mention this because the most useful A/B libraries are ones that are resistant to human nature. PM's will talk about things being "almost significant" at 0.2 everywhere I've worked. |
|
I'm considering the following: - FWER: Holm–Bonferroni, Hochberg's step-up. - FDR: Benjamini–Hochberg, Benjamini–Yekutieli.