Hacker News new | ask | show | jobs
by gershy 728 days ago
It would be so interesting if we came to a consensus that "cascading deletes" should apply to research papers. If a paper is retracted 20+ years later, and it has 4,500 references, those references should be retracted non-negotiably in cascading fashion. Perhaps such a practice could lead to better research by escalating the consequences of fraud.
14 comments

This comment suggests a lack of understanding of the role of references in papers. They aren't like lemmas in proofs. Often an author will reference a work that tried a different approach to solve the same problem the authors are trying to solve, and whether that other paper is problematic or not has nothing to do with the correctness of the paper that refers to the other work.

Now, it's possible that in a particular case, paper B assumes the correctness of a result in paper A and depends on it. But that isn't going to be the case with most references.

If there were grant money for incorrectly claiming "this other thing that isn't a computer behaves just like a computer", well, we wouldn't need VCs anymore.
> If a paper is retracted 20+ years later, and it has 4,500 references, those references should be retracted non-negotiably in cascading fashion.

Imagine you're reading a research paper, and each citation of a retracted paper has a bright red indicator.

Cites of papers that cite retracted papers get orange. Higher degrees of separation might get Yellow.

Would that, plus recalculating the citation graph points system, implement the "cascading deletes" you had in mind?

It could be trivial feature of hypertext, like we arguably should be using already. (Or one could even kludge it into viewers for the anachronistic PDF.)

That would be overwhelming and coarse. You wouldn’t know if an orange or yellow paper actually relies on the retracted citations or it just mentions them in passing, unless you dig through the paper yourself to figure this out yourself, but most people won’t do that.

I think a better method would be for someone to look over each paper that cites a retracted paper, see which parts of it depend on the retracted data, and cut and/or modify those parts (perhaps highlight in red) to show they were invalidated. Then if there’s a lot of or particularly important cut or modified parts, do this for the papers that cite the modified paper, and so on.

This may also be tedious. But you can have people who aren’t the original authors do it (ideally people who like to look for retracted data), and you can pay them full-time for it. Then the researchers who work full-time reading papers and writing new ones can dedicate much less their time questioning the legitimacy of what they read and amending what they’ve written long ago.

I don't know which way would be better, since I don't know the subtleties of citations in different fields. I'll just note that automatically applying this modest taint to papers that cite retracted papers is some incentive for the person to be discerning in what they cite.

Of course, some papers pretty much have to be cited, because they're obviously very relevant, and you just have to risk an annoying red mark appearing in your paper if that mandatory citation is ever retracted.

But citations that are more discretionary or political, in some subfields (e.g., you know someone from that PI's lab is going to be a reviewer), if you think their pettiness might be matched by the sloppiness/sketchiness of their work, then maybe you don't give them that citation, after all.

If this means everyone in a field has incentive for citations to become lower-risk for this embarrassing taint, then maybe that field starts taking misconduct and reviewing more seriously.

Riffing on this,

I wonder if you could assign a citation tree score to each first-level citation.

For example, I cite papers A,B,C,D. Paper A cites papers 1,2,3,4. Paper 1 cites a retracted paper, plus 3 good ones.

We could say "Paper 1" was 0.75, or 75% 'truthy'. "Paper A" would be 3x good + 1x 075% = 3.75/4 = 93.7% truthy, and so on.

Basically, the deeper in the tree that the retracted paper is, the less impact it propagates forth.

Maybe you could multiply each citation by it's impact factor at the top level paper.

At the top level, you'd see:

Paper A = 93.7% truthy, impact factor 100 -> 93.7 / 100 pts

Paper B = 100% truthy, IPF 10 -> 10/10 pts

Paper C = 3/4 pts

Paper D = 1/1 pts

Total = 107 / 115 pts = 93% truthy citation list

If a paper has an outsized impact factor, it gets weighted more heavily, since presumably the community has put more stock in it.

Thus incentivizing authors to add citations to established papers for no reason other than to increase their own trust score. Which already happens to a degree but this would magnify that tenfold.
The question is how many of the citations are actually in support? As in: some might be citations in the form of "Donald Duck's research on coin polishing[1] is not considered due to the controversial nature". Or even "examples of controversial papers on coin polishing include the work of Donald Duck[1]".

I don't think "number of citations" typically make this distinction?

Also for some papers the citation doesn't really matter, and you can exclude the entire thing without really affecting the paper.

Regardless, this seems like a nice idea on the face of it, but practically I foresee a lot of potential problems if done "non-negotiably".

I love the idea. It would also dampen the tendency to over-cite, and disincentivize citation rings. But mainly encourage researchers to actually evaluate the papers they're citing instead of just cherry picking whatever random crap they can find to support their idea.

Maybe negative citations could be categorized separately by the authors and not count towards the cited paper's citation count and be ignored for cascading citations.

If the citation doesn't materially affect the paper, the author can re-publish it with that removed.

> If the citation doesn't materially affect the paper, the author can re-publish it with that removed.

This paper is 22 years old. Some authors have retired. Some are dead.

I really think that at the very least it needs a quick sniff test. Which is boring uninteresting work and with 4,500 citations that will take some effort, but that's why we pay the journals big bucks. Otherwise it's just going to be the academic variant of the Scunthorpe problem.

And/or do something more fine-grained than a binary retraction, such as adding in a clear warning that a citation was retracted and telling readers to double-check that citation specifically.

If you are cherry-picking cites that agree with you, that is a much bigger scandal than you citing a paper that ended up being retracted 22 years later. The point of citations is to cite the relevant literature, pro and con.
I guess those kind of citations should be put in different category that doesn't increase citation count of the referenced paper, in other words raising its prestige. These kind of citations shouldn't do that anyway.

So now if you want to cite come paper you have to decide which papers you'd die and live with, and consequently your paper prestige will be dependent on how many other papers want to die and live with yours.

I guess you can have something like a nofollow attribute

https://en.wikipedia.org/wiki/Nofollow

although the incentives will be more confusing.

There's an argument to be made that citing something to disagree with it should increase its prestige but not its credibility (to the extent that those can be separated): you're agreeing that it's important.

Most citations are just noting previous work. Here are some papers citing the retracted one. (Selected randomly).

>Therefore, MSC-based bone regeneration is considered an optimal approach [53]. [0]

>MSC-subtypes were originally considered to contain pluripotent developmental capabilities (79,80). [1]

Both these examples give a single passing mention of the article. It makes no sense for thousands of researchers to go out and remove these citations. Realisticly you can't expect people to perform every experiment they read before they cite it. Meanwhile there has been a lot of development in this field despite the retracted paper.

[0] https://www.mdpi.com/2073-4409/8/8/886

[1] https://www.tandfonline.com/doi/full/10.3402/jev.v4.30087

Jumping in with the others, this is not good. When I've written papers in the past, and used peer reviewed, trusted journals, what else am I supposed to do? Recreate every experiment and analysis all the way down? Even if it's an entirely SW project, where maybe one could do that, presumably the code itself is maliciously wrong. You'd have to check way too much to make this productive.
> Recreate every experiment and analysis all the way down?

If an experiment or analysis is reliant on the correctness of a retracted paper, then shouldn't it need to be redone? In principle this seems reasonable to me—is there something I'm missing?

EDIT: Maybe I misunderstood... is your point that the criterion of "cites a retracted paper" is too vague on its own to warrant redoing all downstream experiments?

I think usually there's too much building off of each other for this. Standing on the shoulders of giants and whatnot. To me that's the purpose of society and human evolution but I won't get preachy. I didn't read the stem cell paper, but I'll use it for example. Let's say the stem cell paper says "stem cells are one type of cell from the human body" which cites some paper that first found stem cells. Maybe that paper cited the paper that first found any cells. And that one cited a paper about the molecular makeup of some part of the cell. And that cited a paper about what it means for an atom to be in a molecule. And that cited some paper about how atoms can contain electrons, and then that electrons are particles and waves.

I think, personally, it's unrealistic to expect every researcher who mentions anything that has an electron in it (aka most things) to need to recreate the double slit experiment. Or, to harvest the stem cells themselves instead of buying them from trusted suppliers. Yes I do as I type this out see more that if more re-experimenting was done it would help detect fraud. But crucially, it really doesn't matter what an electron is to people determining that stems cells are in humans. The "non-negotiably" is what worries me. There should be some negotiation to say "hey your paper uses this debunked article. You have x days to find another, proven paper that supports the argument, or remove the argument entirely, or we'll retract your paper as well." I think that's valid. Especially since the fraud here wouldn't be impacting the author using the bad paper (most of the time, I would imagine) but rather the ones writing the paper. I would hesitate to believe that people faking such crucial, potentially lifesaving research care that some nobody they'll never meet might be upset their paper doesn't make it.

I think really what I'd like to see instead is more checking done at the peer review stage. To me that's the whole point of the journal. I'm biased on this having been rejected during the peer review stage and disliking how expensive journal articles can get, but at the end of the day, that's the point of them. They should be doing everything in their power to ensure that the research is accurate. And if we can't trust that, what's the point to the journals at all? May as well just go on blogs or something.

That would certainly lead to people checking their references better. But a lot of references are just in passing, and don't materially affect the paper citing it.

One would hope that if some work really did materially depend on a bogus paper, then they would discover the error sooner rather than later.

It probably makes sense to look over papers that cite retracted papers and see if any part of them rely on the invalidated results. But unless the entire paper is worthless without them, it shouldn’t be outright retracted.

How many papers entirely depend on the accuracy of one cited experiment (even if the experiment is replicated)?

This is not at all what a citation means. If someone writes a math paper with a correct result, and the proof is wrong, then you cite that paper to give a corrected proof. If someone writes a math paper where a result itself is incorrect, then you cite that paper to give your counterexample. A citation just means the paper is related, not that it's right or you agree with it.
Just because you cite a paper doesn't mean you agree with it. At least in CS, often you're citing a paper because you're suggesting problems with it, or because your solution works better. Cascading deletes don't really help here - they'd just encourage you not to criticise weaknesses of earlier work, which is the opposite of what you're trying to achieve.
Depends on the paper, it would still require review mechanisms. “Nuke it from orbit”is an overreaction to this, as the debunked paper may play very little part other than as a reference.
I suspect this would have some unintended consequences, not all good.
Like what? Currently, there are no consequences when a paper is retracted. If we retracted more papers, what would the difference be?
Like very valid research being lost because they mention a retracted paper for some minor point that doesn’t really have a major impact on the final results.
That's already something that doesn't happen to blatantly invalid research which is retracted directly. What are you worried about?
What if the citation is "i believe this preceding study to be grievously flawed and possibly fraudulent [ref]"
This is a completely bonkers idea that would accomplish nothing positive and would mostly erase tons of good science.

The idea of punishing third parties for a citation is weird. If I quote somebody who lied, I'm at fault? Seriously?

The priority isn't about punishing you, or about your feelings or career at all. It's about the science.

If you cite something that turns out to be garbage, I'd imagine the procedure would be to remove the citation and to remove anything in the paper that depends on it, and to resubmit. If your paper falls apart without it, then it should be binned.

I can't think of a single paper that would fall apart to any of its cited papers being retracted. What field of science operates that way?

Science papers are novel contributions of data, and sometimes of purely computational methods. A data paper will stand on its own. A method paper will usually (or at least should) operate across multiple data sets to compare performance, or if only on a single dataset it's gonna to be a very well tested dataset.

If MNist turns out to be retracted, would we have to remove all the papers that used it over their years? That's about as deep as a citation can get into being fundamental and integral to a paper. And even in that case nearly any paper operating in that dataset will also be using other datasets. Sure, ignore a paper that only evaluates on a single retracted dataset, but why even bother retracting, as the paper would be ignored anyway, because what significant paper would use a single benchmark?

But 99.9% of citations have less bearing on a paper than being a fundamental dataset used to evaluate the claims in the paper. And those citations are inherently well-tested work product already.

So if people actually care about science, they would never even propose such a scheme. They would bother to at least understand what a citation was first.

You might not be at fault but your work depends on that wrong work, so your work is probably wrong too and readers should be aware of that. If it doesn't depend on it, then don't cite it! People cite the most ridiculous crap, especially in introductions listing common sense background knowledge with a random citation for every fact. That stuff doesn't really affect the paper so it could just be couched in one big "in my opinion" instead.
Academic papers have to cite related research to situate their contribution, even if they're not directly building on that research. When researchers can't reproduce a paper's results, they have to cite that paper when reporting that, or no one will know what they're talking about and the bad paper cannot be refuted. The whole system also needs many compare and contrast citations that aren't built on directly or at all, so you know what a paper is doing and not doing.
Yea, I hadn't really considered those kinds of citations. I was thinking of the piles of worthless citations that authors often put in simply because they're supposed to cite every fact, even if it's something that's common sense which they're not treating critically and doesn't affect their own work so they just did a quick search for any paper that made that claim.
> but your work depends on that wrong work, so your work is probably wrong

No, absolutely not, that's pure fallacy.

There might be some small subset of citations that work like a mathematical proof, but how many of these 4500 citations could you find that operate that way?

> There might be some small subset of citations that work like a mathematical proof

And even then, you're just weakening the result, not throwing it out entirely: instead of a proof of X that cites a proof of Y, you have a proof that Y implies X.

Cascading invalidate. I don’t think it should disappear, I think it should be put in deep storage for future researchers doing studies on misinformation propagation.