Data detectives spotted fake numbers in a widely cited paper

Y	Hacker News new \| ask \| show \| jobs

	Data detectives spotted fake numbers in a widely cited paper (economist.com)
	104 points by proxyswapi 1752 days ago

7 comments

snakeboy 1752 days ago

I think the original blog post [0] or Andrew Gelman's discussion of it [1] are both better sources for technical details and some historical context.

In particular, this is not the first such issue for Dan Ariely, as Gelman points out, he has a history of sketchy scientific ethics like doing media tours for studies that he knows failed to replicate.

[0] http://datacolada.org/98 [1] https://statmodeling.stat.columbia.edu/2021/08/19/a-scandal-...

link

ZeroGravitas 1752 days ago

The datacolada post makes the very reasonable request that all data should be released, and scientists should make that a standard thing to do by doing it themselves and requesting others do it.

It feels like this could be applied retroactively too.

In this case the 2012 authors still had the data that they released in 2020 which is how the analysis got done that showed evidence of fraud. Might be worth just asking a whole bunch of people to release data they previously hadn't and collectively putting some time and effort into that.

link

nomoreplease 1751 days ago

Is there a way to make it licensed? Like “if you use our paper or data, then you also need to publish yours”.

link

huitzitziltzin 1751 days ago

It’s not possible to release data in all circumstances. If you work with health data (I have worked with birth certificates, EMRs, inpatient discharge abstracts, drug prescription histories and other data) you can’t post it publicly. You have to promise not to include a table in the paper with a cell size of fewer than ten individuals!

For what it’s worth, the Trump administration attempted to make issuing new health and environmental regs harder by requiring public data disclosure. They did this entirely because they knew that much of the data could not be disclosed. So if you were studying, eg, the effects of some pollutant on a health outcome using private data, you wouldn’t be able to rely on that study in a regulatory context bc the data could not be published.

It’s a worthy idea, but there are exceptions for good reasons.

link

boulos 1752 days ago

One amusing point is that much of Gelman's post has an error itself (that someone pointed out in the comments a couple weeks ago): the NPR interview was in 2017, so "Ariely, as a coauthor of this article, had to have known for at least half a year before the NPR story that this finding didn’t replicate." is incorrect. Maybe Gelman should retract that part :).

Annoyingly, the NPR transcript at [1] only has a small note "(SOUNDBITE OF ARCHIVED NPR BROADCAST)" at the top with no indication of when (the audio doesn't seem to have a date either). The podcast show notes are apparently the only recordation of the date. [2]

[1] https://www.npr.org/transcripts/805808486

[2] https://pbs.twimg.com/media/E9UQv8LWEAkpoR0?format=jpg&name=...

link

1cvmask 1751 days ago

And how it funnily relates to the Gell-Mann amnesia effect. Although named after physicis Murray Gell-Mann:

https://www.epsilontheory.com/gell-mann-amnesia/

link

hrhdkdlfnrne 1752 days ago

A lot of science today is basically parallel construction.

You start with a sexy story that you know will get you a lot of press, like "promising you will be honest actually makes you behave in an honest way" and then you just make that paper happen, however you can.

Under the publish or perish system, scientists don't have time to actually research the topic, and imagine if it fails to confirm - you just wasted a lot of time and didn't publish anything. Too risky, it's much easier to just fake it till you make it, especially since you know peer reviewers never ever will accuse you of fraud.

Any reviewer accusing a scientist of fraud will just be excluded from the community, since it's very important to uphold the narrative that "scientists are always honest, they never cheat like politicians, which is why we must always trust scientists and never question them".

link

thrwyoilarticle 1752 days ago

There's a lot of value being abused in the term 'science'. Science is a highly valued concept but it's the result of following the scientific method, not the output of anyone with a postgrad.

link

dr_dshiv 1752 days ago

Science is what scientists do, like politics is what politicians do? Either science can be critiqued as a social construct or it is an unimpeachable Platonic aspiration. I can see both perspectives. But, communicating that science itself is a somewhat messy social phenomena might be better as a long-term message for the public.

link

guerrilla 1752 days ago

Politics is definitely not defined as what politicians do and science is, as the GP said, when somone follows the scientific method which is something that happens all the time, far from a Platonic aspiration.

link

jbjohns 1751 days ago

Is there such a method?

https://www.discovermagazine.com/planet-earth/the-scientific...

link

guerrilla 1751 days ago

Yes, despite the demarcation problem, I think there is.

https://plato.stanford.edu/entries/scientific-method/

And I think the demarcation problem can be in some ways be solved by distinguishing on predictivity.

link

dr_dshiv 1751 days ago

Right. Nice article. Tldr; no, not really, "the scientific method" is rhetoric.

Though i will say that they missed Robert Hooke's essay on the scientific method. I swear no one knows about this (even though Hooke was a founding and seminal member of the English Royal Society) because Hooke sounds insane. Who makes titles like this? I love it:

A scheme, or idea of the present state of natural philosophy, and how its defects may be remedied by a methodical proceeding in the making experiments and collecting observations wereby to compile a natural history, as to the solid basis for the superstructure of philosophy

link

simonh 1752 days ago

I think it's important to bear in mind that even with the system working perfectly well, and perfect ethics, we should expect to see a lot of papers published with false results.

Lets say there are 200 propositions we want to test and are candidates for publication, that 20 of them are true and that our error rate is 5%. That means when we test the 20 that are true 19 of them will be accurately shown to be correct and 1 will be erroneously found false.

However when the other 180 propositions are tested 5% of them will be erroneously found to be true, that's 9 propositions. This means we will end up with 28 'successful' studies that make it into prestigious journals, about 1/3 of which are false positives.

And as I said, that's if the system works perfectly with no fraud whatsoever. Throw in some human error and it's not surprising if a fair few studies start to look pretty dodgy. Add in some genuine fraud too and you've got a full-on replication crisis with all the trimmings.

link

motohagiography 1752 days ago

Fraud like this is next-order dispicable because of the knock-on effects it has on knowledge and policy. I don't know that stating that is going to have an effect on people who engage in fraud as a matter of course or as passive parties to it, but here we are. The incentives here are what makes most research an unsuitable substitute or appeal to authority for pubic policy. It's a proxy for a petty elitism, and it means to "trust science," just becomes scientistic nihilism. After all, these days you don't need popular consent when you have scientists.

Reading reports like this, I can see why there has been so much criticism in recent years of the concept of the banality of evil, and why so many researchers affect concern for the environment, because more than anyone, they seem to understand what it means to be responsible for poisoning an ecosystem and they need to get out in front of those narratives. Sure, it's just a bit of fraud in a journal, just like it's just a bit of PCB or mercury in a lake, and only a minority of the population who will be impacted, but someone has to call it out as nihilism, or we're a party to it as well.

As much as I dislike blockchain ideas, it makes me think a blockchain DAG of metadata about the integrity of published papers for citations, reproducability, and evidence of certain types of fraud would rebalance the incentives a bit.

link

atoav 1752 days ago

I think it is important not to extrapolate from a replication crisis in one field (e.g. psychology) to all the other sciences, because the picture painted by this (to my knowledge) doesn't accurately describe the underlying practises.

link

btilly 1751 days ago

No, it is exactly the opposite.

The well-documented failings of psychology should cause us to take a closer look at other fields to see if they have similar problems. If they don't, then great. If they do, then fix them.

For example people studying metascience have found that a lot of medical research is of questionable accuracy. It is not as bad as psychology. But it is bad and I'm glad that people are taking this problem seriously.

link

atoav 1751 days ago

Good point. We should be careful in other fields as well.

I came at it from another direction here — there are people who are like: "Look at the replication crisis in psychology — this is proof science cannot be trusted in general". So what I meant to argue here is that this conclusion cannot be drawn automatically, not that we shouldn't scrutinize other fields (we should!)

link

hrhdkdlfnrne 1751 days ago

In medical sciences is worse. In one attempt to replicate "lamdmark" cancer studies 89% failed to replicate.

https://www.nature.com/articles/483531a

link

PaulDavisThe1st 1751 days ago

> A lot of science today is basically parallel construction. You start with a sexy story that you know will get you a lot of press, like "promising you will be honest actually makes you behave in an honest way" and then you just make that paper happen, however you can.

Gigantic accusation, zero evidence.

If this was a paper, rather than an HN comment, I'd say there was every chance that it would be self-illustrating.

link

blunte 1752 days ago

Seems to me...

Any author of a published paper who will not stand behind the paper should have their name removed.

When there is just one name left, that person either accepts responsibility for the content, or they too disavow it and get removed.

When there are no names left, the paper is retracted.

link

new_guy 1752 days ago

More than that, they should have their degree revoked too.

The spam in these journals puts Buzzfeed to shame.

link

dang 1751 days ago

Previous threads on this. Others?

A study on dishonesty was based on fraudulent data - https://news.ycombinator.com/item?id=28271805 - Aug 2021 (42 comments)

Noted study in psychology fails to replicate, crumbles with evidence of fraud - https://news.ycombinator.com/item?id=28264097 - Aug 2021 (102 comments)

A Big Study About Honesty Turns Out to Be Based on Fake Data - https://news.ycombinator.com/item?id=28257860 - Aug 2021 (90 comments)

Evidence of fraud in an influential field experiment about dishonesty - https://news.ycombinator.com/item?id=28210642 - Aug 2021 (51 comments)

link

feikname 1752 days ago

It always suprises me how people don't do minimum effort for hiding this stuff.

And no one notices. (or don't care about the obviously suspicious aspect)

Do scientists actually read what they cite?

link

uuidgen 1752 days ago

Did YOU even check what you cite?

The AUTHORS of the original paper got a dataset from a company. They didn't assume the fraud from the start and published the paper based on it.

Later, when they tried to analyze the issue more in-depth, they couldn't replicate the results. THE ORIGINAL AUTHORS PUBLISHED a paper about a failure to replicate. It was just then that someone looked at the original data and found that it was faked.

link

smitty1e 1752 days ago

Divison of Labour[1] is powerful. The need is to incentivise feedback loops to QA the data on the front end.

The fear of reputational implosion is apparently insufficient.

[1] https://en.m.wikipedia.org/wiki/Division_of_labour

link

feikname 1751 days ago

> Did YOU even check what you cite?

I did completely read what was available to me without having an account.

> The AUTHORS of the original paper got a dataset from a company. They didn't assume the fraud from the start and published the paper based on it.

My comment is not about who the culprit is or isn't. Indeed, I don't mention anything about it.

Rather, it's about how, as the title says, a WIDELY cited paper has fabricated data following rather (IMO) obvious red flag patterns and none of the people -who cited the paper- raised issues about that.

Thus, I questioned whether scientists read or not the papers they cite in the parent post. The question is not a judgment, I'm just truly curious since I'm not part of the formal academia, just an undergraduate.

link

bobcostas55 1752 days ago

It's seems highly likely that Ariely did it, not the company.

link

fighterpilot 1751 days ago

Why do you say that?

link

bobcostas55 1751 days ago

He created the excel file. If he wanted to clear his name he could publish the original data as it was sent from the company, but he hasn't done so. And the company obviously has no incentive to falsify the data.

link

smitop 1752 days ago

If they did a good job hiding it then you wouldn't know about it. For every case like this where they made a obvious mistake, there are cases where they didn't, and nobody noticed. But you only tend to hear about the ones with obvious mistakes.

link

ackbar03 1752 days ago

> Do scientists actually read what they cite?

No. What are you kidding me. There's like typically over 50 papers referenced in a typical publication, no way I have the time to read all of them carefully

link

matheusmoreira 1752 days ago

> Do scientists actually read what they cite?

Yes, but few actually scrutinize the methodology of the studies. Statistics is really hard. It's easier to assume peer reviewers would have rejected the paper if it was bad.

link

samhw 1752 days ago

Benford's law (https://en.wikipedia.org/wiki/Benford%27s_law) is a pretty well-known test for fraudulent numbers. Of course, it's not infallible and (depending on the nature of the fake) it may be possible to tailor the numbers to 'pass' it, but it's a good heuristic. I'd be curious to know whether it would have detected this - and, if so, whether it was indeed used.

link

tasogare 1752 days ago

> Do scientists actually read what they cite?

I depends on ones probity, but yes the phenomenon is widespread. There are numerous reasons to cite a paper and just read it superficially: padding the references list, pleasing a reviewer by adding a paper he recommended, citing friends, etc. In my own lab they frequently cite a certain theory which if you actually read about it has nothing to do with what they are doing.

link

vidarh 1752 days ago

The problem is often that reading it is insufficient. Unless you try to replicate, it may well look plausible, and often missing data or details creates barriers that makes you need to want to replicate really badly to put in the effort.

link

iwebdevfromhome 1752 days ago

Data detectives sounds like a great job, how do I become one ?

The pay is amazing.

Become a researcher and do peer review.

link

oenetan 1752 days ago

https://archive.is/YQguM

link