|
|
|
|
|
by telotortium
582 days ago
|
|
Gwern, have you considered hosting your archived docs on a different subdomain (e.g., doc.gwern.net) to make it clearer that they are not something you have authored yourself? Not sure what the best subdomain would be though. |
|
Regardless, I am puzzled how OP got this URL in the first place. He wasn't supposed to, he was supposed to get the canonical Arxiv PDF link. Because this is one of the cache mirrors/local archives†, rather than a regular hosted document. We block everything in /doc/www/ in robots.txt & HTTP no-archive/crawl/mirror/etc headers, and we use JS to swap out the local URL for the original URL whenever the reader clicks or mouse-overs or interacts with a link to the URL in a web page (and that is the only place they should be publicly listed or accessible). If OP read it on gwern.net by seeing a link to it, and he wanted to copy the URL elsewhere, he should have just gotten the canonical "https://arxiv.org/pdf/2108.07686#page=85"... But somehow he didn't.
OP, do you remember how exactly you grabbed this URL? Is this an old link from before our URL swapping was implemented, or did you deliberately work around it, or did you find some place we forgot to swap, or what?
(If anyone is wondering why I mirror Arxiv PDFs like this in the first place: it's for the PDF preview feature in the popups. Because Arxiv blocks itself from being loaded in iframes we need local mirrors for PDF preview to work at all; local mirrors save a new domain lookup and speeds up the PDF preview a lot because we compress the PDF more thoroughly and Arxiv servers are always overloaded; and because readers can potentially pop up many Arxiv PDFs easily, it saves Arxiv a lot of bandwidth and avoid burdening their servers further, so it's just the responsible thing to do.)
†https://gwern.net/archiving#preemptive-local-archiving