Hacker News new | ask | show | jobs
by cs702 1777 days ago
Slowly and almost imperceptibly if you look at it day-to-day, public research repositories like arxiv and biorxiv, along with public code repositories like github and gitlab, are becoming or maybe already are the world's most important academic "journals."

All research and code posted on them gets a quick once-over; good work gets the attention it deserves; bad work is quickly ignored. Reviews take place over the Internet via both public and private forums.

Gatekeeping power lies more and more in the hands of a global, distributed scientific community open to anyone willing and capable of doing and reviewing the work. It's fabulous IMHO.

3 comments

> All research and code posted on them gets a quick once-over; good work gets the attention it deserves; bad work is quickly ignored.

What is the basis for all these claims? Who is giving it the quick once-over?

Crowd-sourced review and information, despite some strengths and high initial hopes, has a record of extraordinary misinformation and disinformation. Why would we want to use that system for scientific research?

I prefer careful peer review, standards for announcing funders, etc.

> Who is giving it the quick once-over?

arXiv has 200 expert moderators spread through different fields to filter out papers that are blatantly misleading, unoriginal, non-substantive, or in need of significant review and revision. It's not a peer-review, but it's not a complete free-for-all either.

This is completely false! Please stop telling people that arXiv moderators judge the technical content of papers. That's not their role and it leads to people trusting arXiv when they should not.

It's a complete-free-for-all.

arXiv moderators do not judge the technical content. They don't filter misleading submissions. They don't filter work in need of revision. This is literally described on the arXiv website: https://arxiv.org/help/moderation "What policies guide moderation before public announcement? "

They filter spammers, check for total obvious nutjobs ("My grilled cheese said P=NP"), for crazy formatting, for blatant copyright violations.

Publishing junk on arXiv is trivial if you're not too crazy and know a little bit how to use the right words. You can publish anything.

> They don't filter work in need of revision. This is literally described on the arXiv website: https://arxiv.org/help/moderation "What policies guide moderation before public announcement? "

The page you're linking backs up what I've listed. The third subheader in that section for example:

> A submission may be declined if the moderators determine it lacks originality, novelty, or significance.

> Submissions that do not contain original or substantive research, including undergraduate research, course projects, and research proposals, news, or information about political causes (even those with potential special interest to the academic community) may be declined.

> Papers that contain inflammatory or fictitious content, papers that use highly dramatic and misrepresentative titles/abstracts/introductions, or papers in need of significant review and revision may be declined.

---

> it leads to people trusting arXiv when they should not

I'm only claiming that their moderation is a quick once-over to filter out papers blatantly in violation of those policies (like the "total obvious nutjobs" you describe), while being clear that it's not a peer review.

“May” is the key word in that long sentence. bioRxiv certainly does not review content of submissions using the standard definition of the word “review”. They may scan for style and general content type—for example rejecting reviews.

But I agree with the parent comment that these archives are extremely valuable.

I disagree that informal community comments are of much critical value. In most cases twitter comments and micro-reviews are relatively trivial and are usually based on quick reads rather than deep perusals.

If you happen to be interested in the "my grilled cheese said P=NP" type papers [0] for entertainment value, check out Vixra [1].

Though calling them papers can be a stretch.

Here's one that "proves" P=NP [2].

[0] https://vixra.org/pdf/1903.0040v1.pdf

[1] https://vixra.org

[2] https://vixra.org/pdf/1502.0047v1.pdf

Thanks! I will look that up (but if you happen to have any links ...).
> good work gets the attention it deserves

I'm not sure I'd go that far with the optimism, at least in my field (artificial intelligence). Some obviously bad work does immediately fade into obscurity, and better work probably does on average get more attention, but the variation is huge. There is so much stuff on arXiv, that to get attention you need some kind of PR push so people notice it in the firehose, or a dice-roll around a viral tweet or science journalist noticing it. Some of the better funded university and corporate research groups have actual professional PR and science-comm teams doing coordinated social-media blitzes, press releases, and blog posts around new arXiv papers! That's a huge factor in determining whether a given paper gets attention.

Thanks for the perspectives. One important nitpick:

> Some obviously bad work does immediately fade into obscurity, and better work probably does on average get more attention

You don't know what you haven't seen, which is just a restatement of the core problem. You'd need a study of the entire population to know about the correlation between paper 'quality' and outcomes.

The problem is that careful (conference) reviews don’t scale. Large conferences end up suffering from a highly stochastic behavior where excellent work is borderline rejected on a regular basis while mediocre/incorrect work gets accepted every so often. github/arxiv are no silver bullet but offer an interesting alternative (with their own set of challenges, though).
Every solution has flaws, as you say. The careful reviews have worked very well - science is one of the great triumphs of humanity.

That doesn't mean we shouldn't look for improvements, of course.

I'm idly curious:

does arxiv show who reviewed it? credentials etc?

Is there a trust scoring mechanism or some such?

is there some way to show a graph of reviewing?

Is there a restriction on who can post a paper?

(I'm not saying any of these are needed to make it "respectable" or even that they should be... just wondering how arxiv does its thing)

Personally I like the overall concept of arxiv. Even with no one necessarily reviewing a paper, which is probably unlikely, the fact its even accessible for later review when necessary is worthwhile.

Speaking of which, if anyone knows someone at arXiv, would be great if you could prod someone to get back to me about this PR or the associated email I sent them about it: https://github.com/arXiv/arxiv-browse/pull/197

It would add the ability for people to state that they have reviewed a given work. Might not be the direction they want to go in, or it might not be - but so far I'm not even sure if someone's seen it, unfortunately.

As far as I know the moderator isn’t shown, but that’s probably a good thing as things get approved on a ~overnight schedule so publicly shaming them for mistakes seems counter productive. Additionally the moderation is very light, “April fools” papers are common and I’ve never heard of something being rejected.

For submitting you need to be approved by someone who already has several submissions in that field.

If you want to have to read 5,000 poorly written error riddled works to find a single gem I guess it's okay. Unfortunately, I'm not an immortal
I much prefer a system where connected insiders push through un-replicatable papers in an opaque quid pro quo system so they can pad each others' tenure committee packets!
The current publication system doesn't necessarily avoid that problem either. Many papers are accepted due to the prestige of certain authors and conformity to other papers in the same field. Filters like these can lead to a false sense of security in the quality of the work.
It's really much better than that. At least for the subfields I've followed.
Then we will also have to find a decentralized rating/sorting system that works.

Finally. ( = Let us devise it for the publications, then let us export it to the many, uttermostly crucial, contexts that can use it).

Github stars / number of forks can act as coarse initial quality indicators for foss, why not for academic papers?
Good thing you aren’t the only one looking!