Hacker News new | ask | show | jobs
by jansenderr 1626 days ago
really? I think you are being extremely unfavorable to the authors here. They are simply stating the results on blood pressure went down but in a non-significant way. There is nothing wrong with that and also they are not making claims either way they are just reporting the results of the study and literally describing what happened. Please enlighten with what claims they are allegedly making? that the blood pressure decreased in a non-significant way?
4 comments

"Non-significant" in scientific means "within the error bars". It means that it is indistinguishable from randomness.
That's a bit of an oversimplification that leaves out important qualifiers. "Non-significant" usually means it's indistinguishable from a null hypothesis when a certain level of randomness is allowed in a single, isolated trial.

Who picks the null hypothesis? How is it picked, i.e. why does that specific hypothesis get favourable treatment? What level of randomness should one allow? What does it even mean for an experiment to be a single, isolated trial? How can anything be?

Those are critical questions to understand the concept, and your explanation just pretends they don't exist.

You may be technically correct, but in real life where the rest of us live, it means this study can't be used to draw any conclusions.
That's another fallacy of frequentist reasoning, that we have to draw definitive conclusions from evidence. That something is definitely false until we have "statistical significance" where it all of a sudden becomes definitely true.

In real life, to borrow your description, we can hold varying levels of belief in statements depending on how strong the evidence is, and the magnitude of the payoff in the various cases.

Maybe the probability of the result in the study in question is 51 %. That's still more than 50 %. Whether that difference is meaningful to you is not something someone else can decide.

Nobody who knows what they are doing, and uses statistics, can flip from something being definitely true to definitely false. At best, they can find overwhelmingly convincing probabilities close to 0 or 1.

Honest scientists who use statistics do not make such a claim that an effect does not exist. Rather than the experiment that was conducted did not produce sufficient evidence (to a numerically defined standard) which justifies believing in the effect.

That is to say, that the existence of the effect, given the results of the experiment, has a low likelihood, and that low likelihood can be statistically quantified.

What that means is that exactly the same results as were observed will, or would, with a high probability, also be observed if the experiment occurs in the null hypothesis universe: the world in which the effect is absent.

So even if we are not in that universe (the effect is real), the experiment didn't show it.

The experiment simply doesn't discriminate between the null hypothesis and its negation to a level that could convince one to hold a probabilistic belief in the existence of the effect.

> the existence of the effect, given the results of the experiment, has a low likelihood, and that low likelihood can be statistically quantified

You have this completely backwards. It means that the likelihood of the null hypothesis was not below some threshold such that it can be "ruled out". It says absolutely nothing about the likelihood of the data if the effect exists.

Of course, but the fact that people apply a binary threshold tells you that they want to be able to rule out some things from their models entirely, and include other things as something that's as good as a true fact.
This is not about fallacy or frequentism.

You badly misunderstand

The null hypothesis, in a nutshell, is the proposition that the effects which the experiment is designed to look for do not exist.

The obvious and only possible null hypothesis in this situation that tuning from 432 to 440 does absolutely diddly squat to the listener's physiology.

(Next day reply, after reading this discussion a couple of times)

> "432 Hz tuned music was associated with a slight decrease of mean (systolic and diastolic) blood pressure values (although not significant)"

I can see why this line is fine to some and bothersome to others.

Strictly, it's just describing their results. No problem. The numbers are what they are.

On the other hand, why draw attention to the difference in means, when they are about to tell you not to take it very seriously?

This version avoids that:

"432 Hz tuned music was not associated with a significant difference in mean (systolic and diastolic) blood pressure values"

Maybe it's my age, origin, or personality, but I prefer this version.

"Not significant" means that the probability is >=5% their result was obtained by chance.

We've settled as a community on a convention that we don't claim an effect is real until it is supported by data ("statistically significant") ie. <5% likely to be explained by chance in your results.

"Significant" does not mean big or important in this context. It means better than 5% unlikely to be (un)lucky data.

The threshold for significance lies in the eye of the beholder. A particle physicist might not be satisfied with anything over 0.01 %. A social scientist might be happy to see 10 %.

The 5 % number you mention is completely arbitrary and often woefully inappropriate.

Look at it from a betting perspective. Can you earn more than 10 × your investment if the null hypothesis is false? Then anything less likely than 10 % is significant.

It's a convention for scientific reporting. Your trades are not bound by this convention.

The parameter value is not arbitrary. It's a convention arrived at after hundreds of years. If it were arbitrary, p=0.999 or p=0.00001 would be just as good. We've settled on p=0.05 being usefully convincing but not crazy demanding to obtain by experiment with noisy measurements.

Null hypothesis testing was invented less than 100 years ago by Fisher, who completely arbitrarily picked 0.05 [0]. That value was not arrived at through wisdom of experience, and certainly not after hundreds of years of practice.

Though it has now indeed become conventional to test with p=0.05, there is nothing wrong with reporting an effect that fails the null hypothesis test. At least that is the position of the American Statistical Association [1].

[0] https://www.cantorsparadise.com/what-is-the-meaning-of-p-val...

[1] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5017929/

Thanks for these refs. I read [1] carefully and I take your point that it’s ok to strictly report whatever the data says.

On the value itself, we are quibbling about the meaning of ‘arbitrary’: Fisher certainly could have chosen another value, but not all values would be considered useful. Some expertise about the nature of real world data and the minds of statisticians is encoded in the chosen value.

If I propose that we change the convention to use 1e-12 instead and you think ‘that’s too small, I prefer it the way it is’, then it’s not arbitrary in the sense I mean.

The thing you seem to be missing is that there's no one number that's a meaningful limit for all purposes.

What probability you accept as significant should depend entirely on how you plan to use the results. Something with a p value of a staggering 70 % (i.e. it's more likely not true than true) is significant if the payoff is good when it's true, and the cost is small when it's not true.

And 70 % is very far from 5 %!

Then again, if the payoff is tiny compared to the cost, you might ask for a p-value of less than 0.01 %, in order for it to make sense to take the chance on it.

Think like a poker player: a hand that has 1/4 chance of winning needs better than a 3-to-1 payout when it wins to be playable. Conversely, when the pot offers you a 3-to-1 payout, you better make sure your hand has more than a 1/4 chance of winning.

They didn't claim it was real did they? Just read out the result which was lower but not in a significant way. I've read hundreds of papers do the same.
"was lower but not in a significant way"

By convention, this means "indistinguishable from", so reporting that it is lower is an unsupported claim. They would be equally justified in reporting that it was higher, ie. not at all.

It was lower though, just not significantly so (depending on your threshold for significance) - that's the standard way of reporting it. You can't just chunk part of the sentence and take issue with it, the sentence in it's entirety is accurate.
Yes I agree that the sentence is accurate.

Perhaps I’m too zealous about it, and maybe the conventions vary, but I was trained to avoid using words like ‘lower’ here.

> which was lower but not in a significant way

But if "result [was] lower but not in a significant way" means "result was not proven to be lower", how does saying "lower but not really lower" make ever any sense? It seems to me that such nonsensical formulation ought not to be ever used by anyone.

Because significance thresholds can vary pretty dramatically. Plenty of experiments done in physics for instance have reported results even though they didn't yet reach a 5 sigma threshold (3x10e-7). In physics something can be highly highly likely but still not 'significant' enough to warrant a discovery. They simply couch it as, hey this was the result and even though it isn't 'significant' the high likelihood may warrant additional research here. Reporting a binary significant/not significant is far less useful.
> "Not significant" means that the probability is >=5% their result was obtained by chance.

Ackchually... p-value represents the probability that results like these would be observed even if there was no difference between the two choices, simply due to chance.

The entire point of non-significant is that you can not say that anything happened

Authors are making an extremely serious mistake here and you shouldn't be pushing back