I think the original blog post [0] or Andrew Gelman's discussion of it [1] are both better sources for technical details and some historical context.
In particular, this is not the first such issue for Dan Ariely, as Gelman points out, he has a history of sketchy scientific ethics like doing media tours for studies that he knows failed to replicate.
The datacolada post makes the very reasonable request that all data should be released, and scientists should make that a standard thing to do by doing it themselves and requesting others do it.
It feels like this could be applied retroactively too.
In this case the 2012 authors still had the data that they released in 2020 which is how the analysis got done that showed evidence of fraud. Might be worth just asking a whole bunch of people to release data they previously hadn't and collectively putting some time and effort into that.
It’s not possible to release data in all circumstances. If you work with health data (I have worked with birth certificates, EMRs, inpatient discharge abstracts, drug prescription histories and other data) you can’t post it publicly. You have to promise not to include a table in the paper with a cell size of fewer than ten individuals!
For what it’s worth, the Trump administration attempted to make issuing new health and environmental regs harder by requiring public data disclosure. They did this entirely because they knew that much of the data could not be disclosed. So if you were studying, eg, the effects of some pollutant on a health outcome using private data, you wouldn’t be able to rely on that study in a regulatory context bc the data could not be published.
It’s a worthy idea, but there are exceptions for good reasons.
One amusing point is that much of Gelman's post has an error itself (that someone pointed out in the comments a couple weeks ago): the NPR interview was in 2017, so "Ariely, as a coauthor of this article, had to have known for at least half a year before the NPR story that this finding didn’t replicate." is incorrect. Maybe Gelman should retract that part :).
Annoyingly, the NPR transcript at [1] only has a small note "(SOUNDBITE OF ARCHIVED NPR BROADCAST)" at the top with no indication of when (the audio doesn't seem to have a date either). The podcast show notes are apparently the only recordation of the date. [2]
A lot of science today is basically parallel construction.
You start with a sexy story that you know will get you a lot of press, like "promising you will be honest actually makes you behave in an honest way" and then you just make that paper happen, however you can.
Under the publish or perish system, scientists don't have time to actually research the topic, and imagine if it fails to confirm - you just wasted a lot of time and didn't publish anything. Too risky, it's much easier to just fake it till you make it, especially since you know peer reviewers never ever will accuse you of fraud.
Any reviewer accusing a scientist of fraud will just be excluded from the community, since it's very important to uphold the narrative that "scientists are always honest, they never cheat like politicians, which is why we must always trust scientists and never question them".
There's a lot of value being abused in the term 'science'. Science is a highly valued concept but it's the result of following the scientific method, not the output of anyone with a postgrad.
Science is what scientists do, like politics is what politicians do? Either science can be critiqued as a social construct or it is an unimpeachable Platonic aspiration. I can see both perspectives. But, communicating that science itself is a somewhat messy social phenomena might be better as a long-term message for the public.
Politics is definitely not defined as what politicians do and science is, as the GP said, when somone follows the scientific method which is something that happens all the time, far from a Platonic aspiration.
Right. Nice article. Tldr; no, not really, "the scientific method" is rhetoric.
Though i will say that they missed Robert Hooke's essay on the scientific method. I swear no one knows about this (even though Hooke was a founding and seminal member of the English Royal Society) because Hooke sounds insane. Who makes titles like this? I love it:
A scheme, or idea of the present
state of natural philosophy, and how its defects may be remedied by a methodical
proceeding in the making experiments and collecting observations wereby to compile
a natural history, as to the solid basis for the superstructure of philosophy
I think it's important to bear in mind that even with the system working perfectly well, and perfect ethics, we should expect to see a lot of papers published with false results.
Lets say there are 200 propositions we want to test and are candidates for publication, that 20 of them are true and that our error rate is 5%. That means when we test the 20 that are true 19 of them will be accurately shown to be correct and 1 will be erroneously found false.
However when the other 180 propositions are tested 5% of them will be erroneously found to be true, that's 9 propositions. This means we will end up with 28 'successful' studies that make it into prestigious journals, about 1/3 of which are false positives.
And as I said, that's if the system works perfectly with no fraud whatsoever. Throw in some human error and it's not surprising if a fair few studies start to look pretty dodgy. Add in some genuine fraud too and you've got a full-on replication crisis with all the trimmings.
Fraud like this is next-order dispicable because of the knock-on effects it has on knowledge and policy. I don't know that stating that is going to have an effect on people who engage in fraud as a matter of course or as passive parties to it, but here we are. The incentives here are what makes most research an unsuitable substitute or appeal to authority for pubic policy. It's a proxy for a petty elitism, and it means to "trust science," just becomes scientistic nihilism. After all, these days you don't need popular consent when you have scientists.
Reading reports like this, I can see why there has been so much criticism in recent years of the concept of the banality of evil, and why so many researchers affect concern for the environment, because more than anyone, they seem to understand what it means to be responsible for poisoning an ecosystem and they need to get out in front of those narratives. Sure, it's just a bit of fraud in a journal, just like it's just a bit of PCB or mercury in a lake, and only a minority of the population who will be impacted, but someone has to call it out as nihilism, or we're a party to it as well.
As much as I dislike blockchain ideas, it makes me think a blockchain DAG of metadata about the integrity of published papers for citations, reproducability, and evidence of certain types of fraud would rebalance the incentives a bit.
I think it is important not to extrapolate from a replication crisis in one field (e.g. psychology) to all the other sciences, because the picture painted by this (to my knowledge) doesn't accurately describe the underlying practises.
The well-documented failings of psychology should cause us to take a closer look at other fields to see if they have similar problems. If they don't, then great. If they do, then fix them.
For example people studying metascience have found that a lot of medical research is of questionable accuracy. It is not as bad as psychology. But it is bad and I'm glad that people are taking this problem seriously.
Good point. We should be careful in other fields as well.
I came at it from another direction here — there are people who are like: "Look at the replication crisis in psychology — this is proof science cannot be trusted in general". So what I meant to argue here is that this conclusion cannot be drawn automatically, not that we shouldn't scrutinize other fields (we should!)
> A lot of science today is basically parallel construction. You start with a sexy story that you know will get you a lot of press, like "promising you will be honest actually makes you behave in an honest way" and then you just make that paper happen, however you can.
Gigantic accusation, zero evidence.
If this was a paper, rather than an HN comment, I'd say there was every chance that it would be self-illustrating.
The AUTHORS of the original paper got a dataset from a company. They didn't assume the fraud from the start and published the paper based on it.
Later, when they tried to analyze the issue more in-depth, they couldn't replicate the results. THE ORIGINAL AUTHORS PUBLISHED a paper about a failure to replicate. It was just then that someone looked at the original data and found that it was faked.
I did completely read what was available to me without having an account.
> The AUTHORS of the original paper got a dataset from a company. They didn't assume the fraud from the start and published the paper based on it.
My comment is not about who the culprit is or isn't. Indeed, I don't mention anything about it.
Rather, it's about how, as the title says, a WIDELY cited paper has fabricated data following rather (IMO) obvious red flag patterns and none of the people -who cited the paper- raised issues about that.
Thus, I questioned whether scientists read or not the papers they cite in the parent post.
The question is not a judgment, I'm just truly curious since I'm not part of the formal academia, just an undergraduate.
He created the excel file. If he wanted to clear his name he could publish the original data as it was sent from the company, but he hasn't done so. And the company obviously has no incentive to falsify the data.
If they did a good job hiding it then you wouldn't know about it. For every case like this where they made a obvious mistake, there are cases where they didn't, and nobody noticed. But you only tend to hear about the ones with obvious mistakes.
No. What are you kidding me. There's like typically over 50 papers referenced in a typical publication, no way I have the time to read all of them carefully
Yes, but few actually scrutinize the methodology of the studies. Statistics is really hard. It's easier to assume peer reviewers would have rejected the paper if it was bad.
Benford's law (https://en.wikipedia.org/wiki/Benford%27s_law) is a pretty well-known test for fraudulent numbers. Of course, it's not infallible and (depending on the nature of the fake) it may be possible to tailor the numbers to 'pass' it, but it's a good heuristic. I'd be curious to know whether it would have detected this - and, if so, whether it was indeed used.
I depends on ones probity, but yes the phenomenon is widespread. There are numerous reasons to cite a paper and just read it superficially: padding the references list, pleasing a reviewer by adding a paper he recommended, citing friends, etc. In my own lab they frequently cite a certain theory which if you actually read about it has nothing to do with what they are doing.
The problem is often that reading it is insufficient. Unless you try to replicate, it may well look plausible, and often missing data or details creates barriers that makes you need to want to replicate really badly to put in the effort.
In particular, this is not the first such issue for Dan Ariely, as Gelman points out, he has a history of sketchy scientific ethics like doing media tours for studies that he knows failed to replicate.
[0] http://datacolada.org/98 [1] https://statmodeling.stat.columbia.edu/2021/08/19/a-scandal-...