Hacker News new | ask | show | jobs
by MrEldritch 2317 days ago
hm. upon reading the paper, this is sounding very suspicious.

> The study was truly blind. Although the observers were acquainted with our previous studies on magnetic alignment in animals and could have consciously or unconsciously biased the results, no one, not even the coordinators of the study, hypothesized that expression of alignment could have been affected by the geomagnetic situation, and particularly by such subtle changes of the magnetic declination. The idea leading to the discovery of the correlation emerged after sampling was closed and the first statistical analyses (with rather negative results, cf.Figure 1) had been performed.

Like, am I reading this wrong, or are they straight-up saying "we couldn't achieve statistical significance on our original hypothesis, so we just went fishing for correlations until one of them came up significant, and it turned out to be magnetic declination"?

5 comments

Well, it would be p-hacking if you tried 1,032 different hypotheses until you got one that passed your threshold. There's quite a lot of scientific history (e.g. Kepler's discovery that planets went in elliptical orbits) that would have to be thrown out if you decided you could never use data for anything other than the original hypothesis. Kepler didn't even collect the data, much less collect it with the idea that the planetary orbits were elliptical.

Having said that, the results smells (pun intended) bad, just because I cannot think of any plausible reason for a non-migrating animal to align with the magnetic field, when defecating or at any other time.

> any plausible reason for a non-migrating animal to align with the magnetic field, when defecating or at any other time.

Snow foxes seem to hunt better when oriented in direction of the magnetic north.

https://m.phys.org/news/2011-01-predation-foxes-aided-earth-... https://youtu.be/D2SoGHFM18I

This is true, and a valid point. The way they phrased it does make me feel more than a little suspicious, nonetheless.

(Besides, there's some other oddity there, like that apparently the alignment only matters when the magnetic field is calm)

When the magnetic field is non-calm that is probably due to a space weather event that is geo-effective and inducing large currents in the ground. The local magnetic field an then be significantly distorted depending on local conductivity. So to me that is not an oddity.
You're right to still feel suspicious. Who's to say they didn't try 1,000 different post-hoc ideas? They declare only one, the may have been others. I'd be looking for preceding research and any published protocols, if I wasn't on mobile and didn't think it would be fruitless.
> any plausible reason for a non-migrating animal to align with the magnetic field, when defecating or at any other time.

I got my dog a few years back when she was just a pup. Over the years, she's done things that she was never taught how to do (swim, hunt, bury her food), she just new how to do them instinctually. I believe something like this falls under that category.

And for the record, she took a crap this morning and was pointing directly north/south.

I mean, humans can detect magnetic fields as well. There's even languages without relative egocentric positions like left/right, only north south east west. Given all that, I don't think it's out there that dogs sense it and like being aligned when they're trying to poop.

https://www.sciencemag.org/news/2019/03/humans-other-animals...

https://en.wikipedia.org/wiki/Guugu_Yimithirr_language

The predominance of geographic directions in Guugu Yimithirr has nothing to do with magnetic sensing; humans know what north and south are from sun positions and memory.
> "The study was truly blind."

I'd argue the study would only be truly blind if the dogs were blind. If blind dogs also oriented themselves north-south, then that would prove that they weren't using visual cues for alignment, such as the position of the sun.

You can feel the heat of the sun
P-hacking only apply to proofs. This study does two things, it falsifies a previous hypothesis fair and square, with no p-hacking, and it postulates another hypothesis, an activity where the concept of p-hacking does not even apply.

The only wrong party here is the one that reported the study found something.

Yes, that's a straight up admission of p-hacking.

Also, the weasel word "truly" signifies deceit.

I don't think that's necessarily a bad thing, though my opinion might've been influenced somewhat by 538.

https://fivethirtyeight.com/features/science-isnt-broken/

Why not? Science that insists on hypotheses written down beforehand is cargo-cult science. Observation is the first and most productive science. Double-blind experiments are to cement gains.
basically, because once you start trying multiple hypotheses on the same dataset, the math used to determine "is this conclusion real, or am I just fooling myself" begins to break down.

The statistical significance threshold usually used is p<0.05, meaning that something is (generally, this is beginning to change since the replication crisis) considered to be a real discovery if it has less than a 1/20 chance of being a false positive under the chosen model.

As soon as you start trying multiple hypotheses, then that 1/20 chance of being a false positive begins to become meaningless. If you can just keep rolling d20s until one of them comes up with a critical hit, then you can easily generate false positives that still look very robust.

This is exactly the sort of bad science - p-hacking, fishing expeditions, and the garden of forking paths - that led to the replication crisis. (And that makes sense, as this paper is from 2013, and predates the widespread discovery of the crisis)

The math continues to work out as long as you use the right approach. You have to collect twice as much data, and then set half of it aside at random without examining it. Then you can do whatever perverse p-hacking multi-modeling curve-fitting whatever to the half you kept until you reach a hypothesis, then check it against the half you set aside to recover the statistical significance you lost by using techniques that may have overfit the first half. Unsurprisingly, the math works out because this approach is isomorphic to collecting the first half, studying it to form a hypothesis, then conducting a proper pre-hypothesized experiment to collect the second half. Validation via holdout sets is the same approach used in machine learning and elsewhere to prevent models from overfitting data.
This is true! I was trying to simplify things a bit for a basic explanation, but I fear I oversimplified. I just meant that the generally used math breaks down; if you're aware of the problem, you can correct for it, but very often people don't.
Stating it more plainly, what you wrote was incorrect, and unfairly tarred a statement that was, in fact, correct.
Thanks! For someone that didn't understand why this was considered p-hacking, that made a whole lot of sense.
p<0.05 is also cargo-cult science, and is much more responsible for the replication crisis -- along with biased sampling (pop. 18-22 yo US psych students).

It is also why we see repeated, spurious insistence that anti-depressants don't do anything.

Experiment design is a subtle skill.

You seem to be under the impression that a study like this gives a hard "yes/no" answer as to whether some hypothesis is true. That is not the case, nor is it ever the case with most studies like these. Instead, you need to do some sort of statistical hypothesis test.

As other comments have pointed out, once you start testing multiple hypothesis on the same dataset, you cannot apply the same significance threshold that you would if you had just begun with a single hypothesis before observing the data. Instead, you need to apply some sort of correction that takes into account the number of hypothesis being tested:

https://en.wikipedia.org/wiki/Family-wise_error_rate#Control...

No. If you collect data and then hunt for "significant" results in it you are guaranteed to find spurious results. This is one of the most basic truths of statistics.
You are confusing hypothesis generation with hypothesis testing. Both are science, but only one is a reliable way to determine truth.
Probable claims. Not truth.
In the non-Platonic real world, truth is claims that we believe have high probability.
Not if you want to claim statistical significance. The math behind this method is based on defining the hypothesis before seeing the data (and even then it's usually very weak evidence of a tiny signal within the noise).
xkcd explains it better than I can. Basically if you pick p values that give 95% certainty 20 times you're probably going to "discover" at least one falsehood.

https://xkcd.com/882/