First we need a "verified" badge in biorxiv/arxiv to verify that the current preprint version is exact copy of the one published in the journal. Then DOI could arrange to redirect to that copy instead
That is not currently how the DOI infrastructure works at all.
Individual entities register DOIs, and decide where they redirect (and can change the resolution at any time). In these cases, the publisher (such as Elsevier) is the one who has registered the DOI, and they get to decide where it redirects/resolves. They also paid for the DOIs.
There are actually a (small-ish) number of DOI registrars. The largest, and most likely by far to be used for scholarly articles, is CrossRef.
Neither CrossRef nor the DOI foundation have the authority to change where a DOI resolves to, against the wishes of the DOI registrant. (It would be like a DNS registrar or the IANA deciding news.ycombinator.com should resolve somewhere other than Y Combinator wants it to -- indeed DOI works pretty analogously to DNS, probably intentionally by inspiration).
What you propose would require major changes to the social and business setup of DOI. Probably to the business/sustainability model too, because a registrant would probably be less excited to pay for a DOI they don't actually get to control the resolution of. (CrossRef and the International DOI Foundation are both non-profits. They still need to pay for their operations, and the DOI infrastructure. That is currently funded by charging registrants for DOIs). It would also require some kind of "regulatory regime" to determine who has the authority on what basis to determine where a DOI resolves (and those 'regulators' would probably increase expenses, which you need a new plan for funding), compared to the current situation where whatever entity registered a DOI decides where it resolves to (similar to DNS).
You need neither.
Simply hash both articles, and reference it by hash.
Then you will automatically get the right paper, no matter the source (it could even be from a bittorrent magnet link).
DOI are horrible invention, they are prone to man in the middle attacks and dead links, please don't use them.
A slight impediment to that is that ArXiv discards PDFs that have not been accessed in a while, and rebuilds them from TeX source if later accessed. The result may not have the same hash - I sometimes even see ArXiv PDFs with today's date in them despite being published a long time ago, because the author used the \today macro. So you would need reproducible builds for the hashes to be valid, or for ArXiv to no longer have the storage concerns thst lead them to this practice. Or you could hash the TeX I suppose.
Yeah, you should hash the TeX. It's a pity really that PDF has become the dominant publication format, it's just so bad and non machine readable. It's absurd to me that scientific publications haven't switched over to HTML, I mean that format was invented for scientific publication...
References to third party websites that can break. HTML is a living spec, so browsers can decide to break things that work today (as happened with Marquee for eg).
Even if you disallow JS entirely, and stick with just HTML/CSS, it has enough warts to not look and behave consistently over time.
Imo tex isn't much more machine readable, depending on what you want to do. Reformatting or lossy conversion to plaintext? Sure. Determining semantics? Good luck.
The journal version and the arxiv version will never hash to the same value because they are not bit-identical. But you want to link to the peer reviewed version, or one which is semantically identical to the peer reviewed version. So somebody needs to check that the arxiv version is semantically identical to the journal version.
You should hash the TeX, not the PDF. Alternatively you could have both documents PGP signed by the author with a hash of the original tex, if you want to make sure you get the right "semantically the same but different" version.
But tbh that seems to be a slippery slope that I wouldn't want to go on, where do you draw the line for your semantic differences? Imagine you quote something which gets edited out, suddenly it looks like you quote nonsense while it's the original references fault.
There is no TeX source for the journal version. The point is that you don't want to trust the author to verify that the peer-reviewed+accepted version is the same as the arxiv version, and that it will not be changed. That's why people generally cite the journal version. Because it's immutable.
Journal versions are simply not immutable because they are referenced by name, not by content. I regularly see a good percentage of dead or wrong DOI, and I've hunted my fair share of papers that were supposedly released in a journal, but that only ever existed in preprint.
Arxiv already accepts latex and compiles it for you, we should expect the same from journals and ask them to publish the hash of the document they received.
Journal versions are reference by journal name, volume, year, page number, indexing a hard copy version you can find in a library. Seems pretty immutable to me.
The journals I published in all accepted latex. But they convert it to use their layouting software. The last correction steps are typically done only in this version, and the author has to backport them into their tex code.
Why should the journal have any interest in making the arxiv version more attractive?
Essentially the same reason we need peer review in the first place. Many authors have strong but wrong opinions. But even without malice:Some don't care that the arxiv version is slightly different from the paper.
Also in many cases there is a final round of modifications done by the publisher that you are not free to distribute. For journal paper I was told that sometimes you cannot even publish the corrected version after rebuttal.
it's not the same file - just the same, final text proof. It will be different from the final formatting in the journal.
I dont think authors have incentive to abuse the system. Just upload the final proof of your manuscript to arxiv, click "final version" , and this lets people know that this is the same article as in the journal.
DOIs are ubiquitous and they would serve the purpose of redirecting to the free pdfs rather then the journal site. This can be applied to existing articles retroactively. Plus, many bibliography styles include the DOI which makes the reference easier to use
Individual entities register DOIs, and decide where they redirect (and can change the resolution at any time). In these cases, the publisher (such as Elsevier) is the one who has registered the DOI, and they get to decide where it redirects/resolves. They also paid for the DOIs.
There are actually a (small-ish) number of DOI registrars. The largest, and most likely by far to be used for scholarly articles, is CrossRef.
Neither CrossRef nor the DOI foundation have the authority to change where a DOI resolves to, against the wishes of the DOI registrant. (It would be like a DNS registrar or the IANA deciding news.ycombinator.com should resolve somewhere other than Y Combinator wants it to -- indeed DOI works pretty analogously to DNS, probably intentionally by inspiration).
What you propose would require major changes to the social and business setup of DOI. Probably to the business/sustainability model too, because a registrant would probably be less excited to pay for a DOI they don't actually get to control the resolution of. (CrossRef and the International DOI Foundation are both non-profits. They still need to pay for their operations, and the DOI infrastructure. That is currently funded by charging registrants for DOIs). It would also require some kind of "regulatory regime" to determine who has the authority on what basis to determine where a DOI resolves (and those 'regulators' would probably increase expenses, which you need a new plan for funding), compared to the current situation where whatever entity registered a DOI decides where it resolves to (similar to DNS).