| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jrdorn 1574 days ago

Most interactions between tests are really minor and indistinguishable from random noise. Those ones aren't worth worrying about.

There are two ways to deal with the bigger interaction effects. First is to predict which experiments will meaningfully interact ahead of time and split the samples between them (making the tests take longer). Second is to run tests in parallel on all users and look at the data to determine interaction effects after the fact (and potentially need to invalidate some results).

In our experience, a mix of these approaches works best. It's really hard to predict meaningful interaction effects ahead of time, so save that for the really obvious cases (e.g. black text on a black background). For everything else, the benefit of running more experiments usually outweighs the cost of occasionally needing to throw out results because of interaction effects. It's much better to run 10 tests and need to throw out 1 than it is to run 5 tests.