Hacker News new | ask | show | jobs
by atomack 1616 days ago
The way we used arxiv worked well in physics, though this is 15 years ago now so might have changed since.

arxiv was about distribution. It didn't replace peer review - articles were still submitted to journals and published there too.

If an article was posted to arxiv and not a journal, the odds of a citation went down massively. And the journal it was submitted to was a factor in whether or not we read it. When articles were eventually published, most authors also updated the preprint with the post peer review version.

Basically it meant that (1) it was easy to keep up to date with what everyone was working on, and pick up interesting new stuff (2) most citations, post 80s, you saw in whatever paper you were reading, you could look up on arxiv and be reading it in seconds.

3 comments

This is not the case in computer science and particularly machine learning, especially in recent years. You'll find many papers where a majority of references are to preprints that stay preprints for ever. You'll also find many papers that have hundreds of citations, all while remaining preprints forever (and many of those citations are from forever-preprints themselves).

In machine learning, for the most part, arxiv is used to avoid peer-review. Or a way to "publish" work that has been rejected by a peer-reviewed publication, of course.

And to be more cynical, it's also a convenient source of references to pad up a Related Work section and make it look like incremental work is part of a growing body of groundbreaking new work. /jaded

Edit: well, I'm not just being cynical. The fact that everyone can put their half-baked papers on arxiv means that the 90% of work that is crap, per Sturgeon's Law, is now a much bigger quantity than ever before and one must sift through reams and reams of crap before finding work that has any meaningful results to report. Again, that's the case in machine learning specifically. I don't know about other fields.

But those Arxiv papers which were not published elsewhere but which got many citations, those were read by others, i.e. reviewed by peers, i.e. peer-reviewed.

Arxiv only lacks the initial quality filter by peer review.

I'm also working in the field of machine learning. In those niche fields I work more specifically (speech recognition), I can usually still get a lot out of Arxiv-only papers. I can pretty easily see the main idea and see if there is some usefulness in the paper or not w.r.t. my own research e.g. by good experimental analysis. In don't really feel overwhelmed in the amount of papers. I don't really see the problem.

Can everyone put their half baked papers on arxiv? I see people on irc discussing about finding sponsors for their paper and what not and how this part not being trivial at all. I don't know the details about how the sponsoring works for arxiv but seems like it doesn't let everyone post their half papers, at the very least.
> arxiv was about distribution. It didn't replace peer review - articles were still submitted to journals and published there too.

I'm surprised that they no longer use the term "preprint" at all, at least it's nowhere to be found on the homepage or "about" section.

The consequences of this amnesia are hilarious: https://twitter.com/gustavnilsonne/status/138948729731431219...

> Why do we call it "preprints"? The term seems to imply that work is preliminary or unfinished. As far as I can tell, the term introduced by @arxiv , the first online repository for scientific manuscripts, is "e-print". Is "preprint" a marketing device invented by publishers?

I think it's because the publisher retains copyright. There is a limit on how "done" the manuscript can be and still be shared for free online. Some universities have started to fight back against this by limiting the scope of copyright restrictions that publishers can impose.
No, in most of physics there is no such limit in practice. The only difference between my published work and the preprint arxiv versions is the font and whether the layout is two columns or one column. They are word-for-word identical with identical figures.
Is it possible that you broke the rules of your journal, but nobody mothers going after a single researcher?
I'm not krastanov and everything is possible, but many publishers do not have such rules: https://en.wikipedia.org/wiki/List_of_academic_publishers_by...
In my particular case, no, there were no pro-forma rules broken.
In ML at least, the arXiv version is the “canonical” one. This is because the conferences have onerous page limits, so the official conference version is usually mangled with lots of cut content. The arXiv will have content restored and be more readable.

In theory you should read arXiv and cite the conference version, but often people cite arXiv and nobody cares because google scholar mostly combines things properly.