| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by alan-crowe 5722 days ago

The traditional notion of a 5% confidence limit comes from devising rules of thumb for agricultural research stations in the 1930's. The basic frame work is that each experiment takes many months, a large plot of land, and plenty of money. You test crop varieties that you are already confident will give a better yield in order to check that they really do so.

Suppose your initial guessing is 50:50 and over some years you run 200 tests. 100 times the crop really does yield better and most of those show up fine. 100 times the crop doesn't actually yield better and 5% of those result in false positives. You end up with around 100 true positives and 5 false positives. A positive result really means something.

Fast forward 80 years and research has changed. You have high throughput screening machines and can test 100,000 different molecules in your hunt for a new antibiotic. Suppose you have got lucky and there really is a new antibiotic in your combinatorial explosion of side chains. A p-value of 5% gives you 5000 false positives. With any luck you don't get a false negative and your new antibiotic also makes it through the initial screen. Now you have 5001 +/- 70 positives. The probability that a positive result is true is only 0.0002 or 0.02%. A positive results still means something important. You are searching for a needle in a haystack and you have discarded 95% of the hay, but there is still plenty of hay left and the 99.98% of the results are wrong.