Hacker News new | ask | show | jobs
by fc417fc802 83 days ago
> Ideally, that's how it is for all papers, but it isn't

We require a method of filtering such that a given researcher doesn't have to personally vet in excruciating detail every paper he comes across because there simply isn't enough time in the day for that.

Ideally such a system would individually for each paper provide a multi-dimensional score that was reputable. How can those be calculated in a manner such that they're reputable? Who knows; that exercise is left for the reader.

In practice "well it got published in Nature" makes for a pretty decent spam filter followed by metrics such as how many times it's been cited since publication, checking that the people citing it are independent authors who actually built directly on top of the work, and checking how many of such citing authors are from a different field.

2 comments

> We require a method of filtering such that a given researcher doesn't have to personally vet in excruciating detail every paper he comes across because there simply isn't enough time in the day for that.

We do require such a method. Isn't that what AI is for? Strictly working as a filter. You still need to personally vet in excruciating detail every paper you rely on for your work.

Maybe. I think that's still experimental and far too resource intensive to do on an individual basis. However an intensive LLM review performed by a centralized service once per paper as a sort of independent literature watchdog would likely be of value. I haven't heard of such a thing yet though.
Can't we do better than that?

PageRank was a decent solution for websites. Can't we treat citations as a graph, calculate per-author and per-paper trustworthiness scores, update when a paper gets retracted, and mix in a dash of HN-style community upvotes/downvotes and openly-viewable commentary and Q&A by a community of experts and nonexperts alike?

Of course we could! My tongue in cheek "exercise is left for the reader" comment was meant to convey that it's deceptively simple.

Just one example off the top of my head. How do you handle negative citations? For example a reputable author citing a known incorrect paper to refute it. You need more metadata than we currently have available.

tl;dr just draw the rest of the fucking owl.

Upvotes, downvotes, and commentary? That's extremely complicated. Long term data persistence? Moderation? Real names? Verification of lab affiliations? Who sets the rules? How do you cope with jurisdictional boundaries and related censorship requirements? The scientific literature is fundamentally an open and above all international collaboration. Any sort of closed, centralized, or proprietary implementation is likely to be a nonstarter.

Thus if your goal is a universal system then I'm fairly certain you need to solve the decentralized social networking problem as a more or less hard prerequisite to solving the decentralized scientific literature review problem. This is because you need to solve all the same problems but now with a much higher standard for data retention and replication.

Very topically I assume you'd need a federated protocol. It would need to be formally standardized. It would need a good story for data replication and archival which pretty much rules out ActivityPub and ATProto as they currently stand so you're back to the drawing board.

A nontrivial part of the above likely involves also solving the decentralized petname system problem that GNS attempts to address.

I think a fully generalized scoring or ranking system is exceedingly unlikely to be a realistic undertaking. There's no problem with isolated private venues (ie journals) we just need to rethink how they work. Services such as arxiv provide a DOI so there's nothing stopping "journals" that are actually nothing more than lightweight review platforms that don't actually host any papers themselves from being built.

> Upvotes, downvotes, and commentary? That's extremely complicated.

No, it is not. Don't throw the baby out with the bath water. Zenodo is centralized, and that is fine. A system hosted by CERN would be universal enough for most purposes.

The truth is, most papers cannot stand on their own, they need a reputable venue. While it is difficult to get into Nature, it is much more difficult to actually contribute something substantial to science. That's why we don't have a system like that.

I think you've misunderstood me. Did you read my final paragraph? I was agreeing with what you wrote there - that simply rethinking how centralized journals operate could accomplish the majority of the goal while sidestepping most of the complexity.

That said, I disagree that papers require a centralized venue in any fundamental sense. They currently need such a venue because we don't have a better process for vetting and filtering them at scale. The issue is that decentralizing such a process in an acceptable manner is a monstrously complicated prospect.

You know that is what PageRank was originally for, right?
Sure. In that case I guess I'm just waiting for a couple of college kids in a garage to start a website that actually uses it for its intended purpose, so that we can finally deprecate PrestigiousPrivateJournalRank.