Hacker News new | ask | show | jobs
by barnabees 1874 days ago
"Fifty-six subjects were recruited (32 CD and 24 NCD). One participant from the CD group was excluded due to imaging artifacts, rendering a final sample of 31 CD and 24 NCD."

Feels too small of a sample size to make any strong conclusions

1 comments

Not necessarily. Suppose, for simplicity, that we have equal number N of coffee drinkers and non coffee drinkers, for total of 2N. Suppose you try your model on them, and tell which is which based on their MRI 80% of the time. Let’s assume the null hypothesis is that you model doesn’t work, and you just were randomly lucky. If N = 5, that’s 8 successes in 10 Bernoulli trials, that’s already p value of 0.01. With N=25 (close to the study), that’s p-value of 0.002, much better than random chance.

Whether N = 50 is too small sample really depends on the strength of the effect you are trying to detect. For strong effects it’s plenty enough.

Thank you for this. It's annoying to see people dismiss studies based some general idea of "not enough people" without being backed by the numbers. Great to see someone back it up!
Your math is whack. 8/10 coin flips being a 1% chance?
The commenter isn't saying that there's a 1% chance of getting 8/10 coin flips. One way that we can test whether a procedure is better than random chance is to use a binomial test. This helps understand whether a proposed rate of success is feasible or not, assuming that every attempt has equal probability. Here, success would be that the procedure distinguishes between coffee and non-coffee drinkers using an MRI. To evaluate "better than random chance" (vs "different than random chance"), we can use the binomial test to understand the probability that the rate of success is greater than 50%.

In R, we can do this with: binom.test(8, 10, p=0.50, alternative="greater") which returns a p-value of 0.05469 and a 95% confidence interval of 0.4930987, 1 [see note below]. This doesn't meet the traditional threshold of 95% confidence, which (in a simplified way) says that we're ok with a 5% chance of a false positive if we repeated this testing procedure many times. So yes, this isn't a p-value of 1%.

If we expand this to 20/25 successes, we have a lot more evidence. The p-value shrinks to 0.002039 and a 95% CI becomes 0.6245949, 1 [see note below]. So we're almost certain that the procedure has a success rate greater than 0.5, but without going deeply into the confusing aspects of how to correctly interpret confidence intervals and instead interpreting them as they are generally used, anywhere from 62% to 100% is reasonable if we're ok with a 5% false positive rate, which is still a wide range (the width is what I'd say is important, ignoring the way people use CIs in practice).

Another way to do this would be to get a p-value as to how feasible a less-than 80% success rate is for 20/25 successes. That has a p-value of 0.5793, which isn't significant at any commonly used level.

In the end, you'd probably use something more sophisticated than a binomial test so that you can account / control for other factors, but hopefully this illustrates what you can do with small sample sizes.

Math details: https://en.wikipedia.org/wiki/Binomial_test#Usage

Note: The upper range is always 1 because of details around calculating a one-sided binomial confidence interval (we're doing "greater than" rather than "different than").