Hacker News new | ask | show | jobs
by neilv 729 days ago
> If a paper is retracted 20+ years later, and it has 4,500 references, those references should be retracted non-negotiably in cascading fashion.

Imagine you're reading a research paper, and each citation of a retracted paper has a bright red indicator.

Cites of papers that cite retracted papers get orange. Higher degrees of separation might get Yellow.

Would that, plus recalculating the citation graph points system, implement the "cascading deletes" you had in mind?

It could be trivial feature of hypertext, like we arguably should be using already. (Or one could even kludge it into viewers for the anachronistic PDF.)

2 comments

That would be overwhelming and coarse. You wouldn’t know if an orange or yellow paper actually relies on the retracted citations or it just mentions them in passing, unless you dig through the paper yourself to figure this out yourself, but most people won’t do that.

I think a better method would be for someone to look over each paper that cites a retracted paper, see which parts of it depend on the retracted data, and cut and/or modify those parts (perhaps highlight in red) to show they were invalidated. Then if there’s a lot of or particularly important cut or modified parts, do this for the papers that cite the modified paper, and so on.

This may also be tedious. But you can have people who aren’t the original authors do it (ideally people who like to look for retracted data), and you can pay them full-time for it. Then the researchers who work full-time reading papers and writing new ones can dedicate much less their time questioning the legitimacy of what they read and amending what they’ve written long ago.

I don't know which way would be better, since I don't know the subtleties of citations in different fields. I'll just note that automatically applying this modest taint to papers that cite retracted papers is some incentive for the person to be discerning in what they cite.

Of course, some papers pretty much have to be cited, because they're obviously very relevant, and you just have to risk an annoying red mark appearing in your paper if that mandatory citation is ever retracted.

But citations that are more discretionary or political, in some subfields (e.g., you know someone from that PI's lab is going to be a reviewer), if you think their pettiness might be matched by the sloppiness/sketchiness of their work, then maybe you don't give them that citation, after all.

If this means everyone in a field has incentive for citations to become lower-risk for this embarrassing taint, then maybe that field starts taking misconduct and reviewing more seriously.

Riffing on this,

I wonder if you could assign a citation tree score to each first-level citation.

For example, I cite papers A,B,C,D. Paper A cites papers 1,2,3,4. Paper 1 cites a retracted paper, plus 3 good ones.

We could say "Paper 1" was 0.75, or 75% 'truthy'. "Paper A" would be 3x good + 1x 075% = 3.75/4 = 93.7% truthy, and so on.

Basically, the deeper in the tree that the retracted paper is, the less impact it propagates forth.

Maybe you could multiply each citation by it's impact factor at the top level paper.

At the top level, you'd see:

Paper A = 93.7% truthy, impact factor 100 -> 93.7 / 100 pts

Paper B = 100% truthy, IPF 10 -> 10/10 pts

Paper C = 3/4 pts

Paper D = 1/1 pts

Total = 107 / 115 pts = 93% truthy citation list

If a paper has an outsized impact factor, it gets weighted more heavily, since presumably the community has put more stock in it.

Thus incentivizing authors to add citations to established papers for no reason other than to increase their own trust score. Which already happens to a degree but this would magnify that tenfold.