|
|
|
|
|
by cschmidt
684 days ago
|
|
It would probably be good to have something considering multiple comparisons (False Discovery Rate, Bonferroni correction), which is often the bane of running a whole series of A/B tests. And, as another poster has mentioned, an anytime approach that is resistant to early stopping due to peaking [1]. For those who haven't read about Fisher's tea experiment: There was a woman who claimed she could tell if the milk was put into the cup before or after pouring the tea. Fished didn't think so, and developed the experimental technique to test this idea. Indeed she could, getting them all right iirc. [1] see https://media.trustradius.com/product-downloadables/UP/GB/AD... for a discussion of the problems with a t-test. There is also a more detailed whitepaper from Optimizely somewhere |
|
[1] https://github.com/assuncaolfi/savvi/
[2] https://openreview.net/forum?id=a4zg0jiuVi