| HN Mirror

Thank you - your comment is right and I conflated two things which are conceptually totally different.

For a given number of experiments and block of time (i.e. available samples over time), it's not useful to say that bandits have higher power / a worse FPR, bc the values are adjustable. F1 or AUC would probably be the right way to compare and it seems likely to me that bandits have better performing precision-recall curves. Basically, this is actually irrelevant to the point, and actually favors bandits.

I was totally thinking about the scenario you mentioned where the number of experiments are unconstrained and old experiments run long. Bandits will spend a lot of their bandwidth on very marginal improvements that are below the effect size cutoff that shorter fixed RCT will set. I think you can fix this with early stopping (or just stopping), so maybe it's not really an issue after all.

Thanks for helping clarify my thinking on this :)