Hacker News new | ask | show | jobs
by davrosthedalek 2516 days ago
The journal version and the arxiv version will never hash to the same value because they are not bit-identical. But you want to link to the peer reviewed version, or one which is semantically identical to the peer reviewed version. So somebody needs to check that the arxiv version is semantically identical to the journal version.
2 comments

You should hash the TeX, not the PDF. Alternatively you could have both documents PGP signed by the author with a hash of the original tex, if you want to make sure you get the right "semantically the same but different" version. But tbh that seems to be a slippery slope that I wouldn't want to go on, where do you draw the line for your semantic differences? Imagine you quote something which gets edited out, suddenly it looks like you quote nonsense while it's the original references fault.
There is no TeX source for the journal version. The point is that you don't want to trust the author to verify that the peer-reviewed+accepted version is the same as the arxiv version, and that it will not be changed. That's why people generally cite the journal version. Because it's immutable.
Journal versions are simply not immutable because they are referenced by name, not by content. I regularly see a good percentage of dead or wrong DOI, and I've hunted my fair share of papers that were supposedly released in a journal, but that only ever existed in preprint.

Arxiv already accepts latex and compiles it for you, we should expect the same from journals and ask them to publish the hash of the document they received.

Journal versions are reference by journal name, volume, year, page number, indexing a hard copy version you can find in a library. Seems pretty immutable to me.

The journals I published in all accepted latex. But they convert it to use their layouting software. The last correction steps are typically done only in this version, and the author has to backport them into their tex code. Why should the journal have any interest in making the arxiv version more attractive?

Even if we ignore reprints, editorial series that rearrange papers (and make a paper citable more than one way), and proceedings (which often don't properly distinguish between papers, but use author + proceeding).

Science simply doesn't operate on journal published papers most of the time. The paper mills run so hot that you regularly cite preprints, that get exchanged between authors directly. It happens regularly that the proof is supposedly in the "full paper" only that the "full paper" was never published.

Why would the author not be trusted? Why do they stand to gain? Arxiv can make the final version immutable too
Essentially the same reason we need peer review in the first place. Many authors have strong but wrong opinions. But even without malice:Some don't care that the arxiv version is slightly different from the paper.
I dont see why anyone would put different content in the two papers since its so trivial to be ridiculed for that. I dont think arxiv has resources to review if the preprints are the same as the final, and it seems an overkill thing to do .
Also in many cases there is a final round of modifications done by the publisher that you are not free to distribute. For journal paper I was told that sometimes you cannot even publish the corrected version after rebuttal.