Hacker News new | ask | show | jobs
by MrGunn 2196 days ago
Hey Gwern, big fan of your GPT2 work. I notice I'm surprised to hear you say you struggle daily to fix broken links to the Elsevier catalog at ScienceDirect, because the links are used by libraries all over the world & they don't have the same feedback. Would you have a few examples available for me to send to the folks responsible?
1 comments

Nature does it all the time. Here's one I fixed just this morning when I noticed it by accident: http://www.nature.com/mp/journal/vaop/ncurrent/full/mp201522... (Note, by the way, how very helpfully Nature redirects it to the homepage without an error. That's what the reader wants, right? To go to the homepage and for Nature to deliberately conceal the error from the website maintainer? This is definitely what every 'archival quality' journal should do, IMO, just to show off their top-notch quality and helpful ways and why we pay them so much taxpayer money.) Oh, SpringerLink broke a whole bunch which I am still fixing, here's two from yesterday: http://www.springerlink.com/content/5mmg0gmtg69g6978/ http://www.springerlink.com/content/p26143p057591031/ And here's an amusing ScienceDirect example: https://www.sciencedirect.com/science/article/pii/S000632071... (I would have loads more specifically ScienceDirect examples except I learned many years ago to never link ScienceDirect PDFs because the links expire or otherwise break.)
Isn't this exactly the intended use-case for the DOI?

Your first article has the DOI 10.1038/mp.2015.225, and the resulting link (https://doi.org/10.1038/mp.2015.225) properly directs to the article's present location.

DOIs link to paywalls or temporarily-unembargoed papers, have to be hunted down (many places hide the DOIs in tabs or, like JSTOR, actually bury it in the HTML source itself!), and break things like section links as well. Adding yet another level of indirection is not my idea of a solution and hardly speaks well of 'archive-quality publishers' that we have to resort to third parties to work around their hideously broken websites which, like Nature, go out of their way to make links not just break but actively misleading.
To solve your immediate problem, just grab the DOI here: https://apps.crossref.org/SimpleTextQuery They also have an API from which you can fetch DOIs in various ways.

DOIs are a solution to the issue of having persistent, publisher-independent links that will always resolve, even if a journal changes publisher or goes out of business. Academia uses them because link rot is unavoidable across the web, but there must always be a link to the publication that resolves so that when someone in 2070 wants to follow a citation in the references of a work published today, they can do that. It's the same thinking that underlies people pointing to the internet archive in Wikipedia citations. It's a layer of redirection, but in a way that preserves accessibility for the long term. It's also the same thinking that underlies DNS. There shouldn't be one company that controls how to resolve an IP address to a domain name, and likewise you shouldn't have to go through one publisher to resolve a reference to a research article.

As a side note, Crossref is staffed with exactly the sort of web geeks that you would see at an Internet Archive get-together (#).

So I hear your frustrations, but I think you're giving DOIs short shrift.

(#) I mean, just look at this. A dump of all journal metadata on Academic Torrents. Is that not cool? https://www.crossref.org/blog/free-public-data-file-of-112-m...

Do Nature's spinoffs have any prestige any more? Anything in a Nature spinoff related to batteries comes across as PR Newswire level material. If that.
Journals do transfer among publishers, go out of business, etc so you shouldn't expect a direct link like that to be stable. The recommended practice is to use the DOI. Would using a DOI meet your needs?