|
|
|
|
|
by wch
4545 days ago
|
|
You're getting "false positive" with the method you've chosen, but it's not a method that would be accepted in a scientific paper as evidence for an experimental effect. Maybe your method is more appropriate for, say, a machine learning context, but it's not what would be used in a paper like this. First, the statistical tests used for these experiments don't make use of Bayesian stats, so the prior 50%-loaded probability simply isn't factored in. The standard is to use null-hypothesis testing, which says roughly, that if the null hypothesis is true -- that is, if there is no actual difference between the populations (experimental groups A and B, for example) -- what is the probability that you'd see a pattern like the one observed in the data. And the tests take sample size into account in calculating this probability. If you throw the die once, the test that you'd use here (Chi-square) would _never_ give you a false positive, that is a p-value of <.05. With small samples, there is too little power to get a the requisite p-value. (And I'll note that Chi-square is one of the tests used in these papers.) There's a whole other debate about whether p-values and null hypothesis tests are the right thing to use, whether the standard 0.05 threshold p-value is small enough, whether Bayesian stats should be used, etc. These are legitimate issues. But they're separate from the claim that small samples will increase the likelihood of a false positive. |
|
(I know of the debates. For all I care Bayesians have won by an overwhelming margin. The only advantage of Frequentist statistics is their relative ease of use. But in the search for truth, you just can't escape Probability Theory. Period. My method wouldn't be accepted in a paper? Then fuck the papers. I'm not trying to get published, I'm trying to get to the truth.)
I don't have the proof nailed down, but based on the examples I can come up with, I'm extremely confident that as long as you use probability theory correctly, small sample sizes do increase the chance of false positives. On the other hand, those false positives will be weaker than the exceptional false positive you might get from larger sample sizes. (Imagine I throw the dice 30 times, and I get zero 6 and 10 ones? It's very rare, but it would make me all the more confident the die is loaded.) If you use that crappy outdated Frequentist junk, however, all bets are off.
---
Note however that in a sense, you are correct: by conservation of expected evidence, the weighted average of evidence you expect is exactly zero: if it were not, you would already have changed your belief at the point of equilibrium. Which means that if you expect lots of weak evidence in one direction, you also expect a little, and very strong, evidence on the other side.
I'm not sure this is what you where getting at, though.
---
When we do null-hypothesis testing, we do assume a prior: using smaller p-values means we're more skeptics towards the competing hypothesis —we have a stronger prior belief for their fallacy. But we don't speak the word "prior", so we can pat ourselves on the back for our "objectivity", and scold the Bayesian for his "subjectivity". Priors, what arrogance. Who is he to believe so and so in the first place? We do science, not faith.
Only we're blind to our own priors.