| HN Mirror

The commenter isn't saying that there's a 1% chance of getting 8/10 coin flips. One way that we can test whether a procedure is better than random chance is to use a binomial test. This helps understand whether a proposed rate of success is feasible or not, assuming that every attempt has equal probability. Here, success would be that the procedure distinguishes between coffee and non-coffee drinkers using an MRI. To evaluate "better than random chance" (vs "different than random chance"), we can use the binomial test to understand the probability that the rate of success is greater than 50%.

In R, we can do this with: binom.test(8, 10, p=0.50, alternative="greater") which returns a p-value of 0.05469 and a 95% confidence interval of 0.4930987, 1 [see note below]. This doesn't meet the traditional threshold of 95% confidence, which (in a simplified way) says that we're ok with a 5% chance of a false positive if we repeated this testing procedure many times. So yes, this isn't a p-value of 1%.

If we expand this to 20/25 successes, we have a lot more evidence. The p-value shrinks to 0.002039 and a 95% CI becomes 0.6245949, 1 [see note below]. So we're almost certain that the procedure has a success rate greater than 0.5, but without going deeply into the confusing aspects of how to correctly interpret confidence intervals and instead interpreting them as they are generally used, anywhere from 62% to 100% is reasonable if we're ok with a 5% false positive rate, which is still a wide range (the width is what I'd say is important, ignoring the way people use CIs in practice).

Another way to do this would be to get a p-value as to how feasible a less-than 80% success rate is for 20/25 successes. That has a p-value of 0.5793, which isn't significant at any commonly used level.

In the end, you'd probably use something more sophisticated than a binomial test so that you can account / control for other factors, but hopefully this illustrates what you can do with small sample sizes.

Math details: https://en.wikipedia.org/wiki/Binomial_test#Usage

Note: The upper range is always 1 because of details around calculating a one-sided binomial confidence interval (we're doing "greater than" rather than "different than").