Hacker News new | ask | show | jobs
by kqr 1626 days ago
That's another fallacy of frequentist reasoning, that we have to draw definitive conclusions from evidence. That something is definitely false until we have "statistical significance" where it all of a sudden becomes definitely true.

In real life, to borrow your description, we can hold varying levels of belief in statements depending on how strong the evidence is, and the magnitude of the payoff in the various cases.

Maybe the probability of the result in the study in question is 51 %. That's still more than 50 %. Whether that difference is meaningful to you is not something someone else can decide.

2 comments

Nobody who knows what they are doing, and uses statistics, can flip from something being definitely true to definitely false. At best, they can find overwhelmingly convincing probabilities close to 0 or 1.

Honest scientists who use statistics do not make such a claim that an effect does not exist. Rather than the experiment that was conducted did not produce sufficient evidence (to a numerically defined standard) which justifies believing in the effect.

That is to say, that the existence of the effect, given the results of the experiment, has a low likelihood, and that low likelihood can be statistically quantified.

What that means is that exactly the same results as were observed will, or would, with a high probability, also be observed if the experiment occurs in the null hypothesis universe: the world in which the effect is absent.

So even if we are not in that universe (the effect is real), the experiment didn't show it.

The experiment simply doesn't discriminate between the null hypothesis and its negation to a level that could convince one to hold a probabilistic belief in the existence of the effect.

> the existence of the effect, given the results of the experiment, has a low likelihood, and that low likelihood can be statistically quantified

You have this completely backwards. It means that the likelihood of the null hypothesis was not below some threshold such that it can be "ruled out". It says absolutely nothing about the likelihood of the data if the effect exists.

Of course, but the fact that people apply a binary threshold tells you that they want to be able to rule out some things from their models entirely, and include other things as something that's as good as a true fact.
What does a non-binary threshold look like, and how is it different from just fine-tuning a regular binary threshold to err more or less on the side of caution?
It's not about a non-binary threshold. The problem is having a threshold in the first place.

Say that given the evidence there's a 9 % chance the null hypothesis is true. A frequentist used to a 10 % significance level would then say the effect is true. A frequentist trained on a 5 % significance level would say it is false.

But that's just an arbitrary cutoff that by itself means nothing.

If instead we look at a practical scenario where we would use this result, we understand the problem space better. Maybe we have figured out how to get limited rights to the transpositions of famous music to other A4s, and this would cost a lot to do, but earn us some money if we do it and the effect is real.

Should we acquire those rights or not? Ask the 5 % frequentist and they would say "there's no significant difference, so you shouldn't." Ask the 10 % frequent and they say the opposite. Who do we listen to?

Let's ask the poker player. They will ask "What exactly does it cost to do this, and how much will you earn if it works out? That matters!"

So let's say it costs us $100 per song to get the rights to the transposed version, and we think it will earn us $102 per song if the effect is true. Now we can just plug in, remembering that there's a 9 % chance the null hypothesis is correct:

-100 + 102 * 0.91 = -7.18 per song

Not good. What if we made $127 per song with the same cost?

-100 + 127 * 0.91 = +15.57 per song

Worth doing, at the same probability of the null hypothesis!

In other words, you can't determine what's significant until you know how you will use the result.

Statistics by itself is meaningless. It gains meaning only when it's used to choose between actions.

> Should we acquire those rights, or not?

That's a binary decision, which is bad, so we shouldn't.

In accordance with the confidence probability, we should buy into a percentage of the rights, and transpose the music to an interpolated value between A=432 and A=440 Hz.

Huh, yeah, you might be right!
This is not about fallacy or frequentism.

You badly misunderstand