Hacker News new | ask | show | jobs
by dfrankow 4314 days ago
1. Beware multiple tests: agreed. We could apply corrections (e.g., Bonferroni-like), but don't underestimate the logistical complication of this. Before a month starts, we don't even know how many experiments will be run (although we could try to predict). A different way to address this problem: use other information. Do the results make sense (good)? Is there corroboratory evidence (good)? Are there crazy outliers or things that smell funny (bad)? Of course the experimenter often has a bias (it worked!). Can they convince others? etc.

2. To compute statistical power, you need an estimate of effect size. For many experiments, we don't know; we could run pilots, but in the naive case that doubles the number of experiments to run, and time to wait. By default, we recommend a sample size that will detect a certain effect size (but not smaller ones). That is, we decide we are not interested in small effects, because it makes our life simpler.