Hacker News new | ask | show | jobs
by kragen 456 days ago
Sheesh, just use BitTorrent. That's what open access licensing is for! BitTorrent's tit-for-tat approach limits the harm selfish actors can do, only greatly rewarding those whose behavior benefits others, and has been shown to be very robust against active disruption attempts for decades now. Moreover, it also confers some resistance to falsification of the published record, to linkrot, and to publishing companies going bankrupt.

Sooner or later we need to take back the legitimate internet from surveillance capitalism. Capitalism is great (it shares many of BitTorrent's virtues, not coincidentally) but surveillance capitalism is not.

2 comments

As much as I like BitTorrent, people (usually) don't want to provide open access to information; what they (usually) want is to be an "open" gateway to that information, as long as they are the centralized point of distribution whose name appears in the URL bar, and as long as they control when they can remove access to that information.

Creating a torrent is not showy enough, because the credit is "just" another file and/or a comment in the torrent metadata.

Granted, they usually do that because they want to "kindly" advertise a way to donate to them (EDIT: or to track you, or other similar goals), and there's nothing wrong with trying to get donations, but there's clearly a conflict of interest at play here.

It doesn't matter what people usually want. It's sufficient for someone to want to torrent the open-access articles, even if everyone else is playing the exploitative games you're describing. The Berlin Declaration that defined "open access" https://openaccess.mpg.de/Berlin-Declaration requires specifically

> The author(s) and right holder(s) of such contributions grant(s) to all users a free, irrevocable, worldwide, right of access to, and a license to copy, use, distribute, transmit and display the work publicly and to make and distribute derivative works, in any digital medium for any responsible purpose, subject to proper attribution of authorship (community standards, will continue to provide the mechanism for enforcement of proper attribution and responsible use of the published work, as they do now), as well as the right to make small numbers of printed copies for their personal use.

This guarantees that such torrents are legal unless the original authors are infringing copyright.

So there is no danger of AI bots destroying open access.

Someone still needs to assemble those documents to create the torrent collections in the first place. That's harder now that captchas and other access walls are getting more and more hostile to human consumption.

So, yes, torrents help to preserve what has already been archived in the past, but we still need a lot more works to be deposited in open archives like Zenodo or arxiv in the first place.

Yes, that is a problem. Publishers should seed the torrents and publish their hashes, thus vouching for the authenticity of the documents therein. But even if they don't, it only takes one trustworthy researcher passing the Captchas to add an open-access document to a legal archive which provides periodic torrents.

To be clear, neither Zenodo nor arXiv rejects non-open-access papers, so you cannot simply provide a torrent of arXiv papers, legally.

Diamond open access and fully OA publishers might do that (maybe as an add-on service on top of LOCKSS or Portico), but the Big Five definitely do not want to do that, because they sell bulk access services at lofty prices.

There's no use expecting anything from the publishers. Universities and independent archives need to do the job.

https://blog.archive.org/2020/09/15/how-the-internet-archive...

As for arxiv, they already make the full dump available:

https://info.arxiv.org/help/bulk_data.html

Even for works not published on arxiv with a Creative Commons license, the basic arxiv license gives them the right to do so:

https://info.arxiv.org/help/license/index.html

You can torrent the arxiv dump with Internet Archive torrents:

https://archive.org/details/arxiv-bulk

That's wonderful! I didn't know that! But, as I read it, the basic arXiv license only permits the arXiv itself to redistribute, not anyone. So it's not clear that participation in the torrent is legal, particularly after the arXiv is shut down.
torrents work fine for immutable artefacts. linux distro ISOs etc are not getting modified after release. that's a new version. same with films (piracy). once a film is released, that's it released. any later versions are just that, a new version.

torrents are a problem for mutable artefacts because you are reliant on your peers having the latest up to date copy, which is not guaranteed. the peers you download from might have just switched their machine on after 5 months, so their copy of the mutable artefact is 5 months out of date. as ever with distributed systems, requiring consistency introduces complexity.

"open gateways" (term used by a sibling comment) provide much simpler mutability. which makes sense when there's like a simple typo in a PDF document that requires the document's replacement. just replace the document on the web server. bam! everyone now has access to the latest corrected version immediately.

also, most of the general population doesn't know how to use torrents. just because there's a part-way technical solution, doesn't mean it makes sense to switch everything over to some fancy new proposal (not the underlying tech, which is old now).

if users would struggle to use the implementation, why do it? what benefits are there except for a seemingly more technically perfect solution?

Published papers, just like accounting records, are immutable artifacts. People need to be able to say, "In Arneson & Dijksterhuis 2014, the figure given for the solanine concentration in eggplants is an order of magnitude too high because of a typographical error," or, "In Arneson & Dijksterhuis 2014, the figure given for the solanine concentration in eggplants is correct to within 25% and does not contain a typographical error," and for other people to be able to verify that. "Just replac[ing] the document on the web server" is considered serious academic misconduct, among other things because you might be introducing errors into previously correct published work that others had referenced. For this case, torrents' immutability is a feature, not a bug.

In 01994 the general population didn't know how to use the internet, but it was already very useful for researchers. Software improved over time to simplify using it.

What benefit would there be? The benefit would be that it prevents AI bots from destroying Open Access.

okay, let’s run with your specific example.

how do you deal with retractions? how do you deal with academic/research conduct so egregious that all previous versions of a retracted paper need to be edited with “RETRACTED” in big red letters over all the text on every page of the paper? just to make sure no-one ever accidentally reads one page and thinks it is a legitimate source of information.

like the one written by disgraced ex-doctor andrew wakefield: https://www.thelancet.com/journals/lancet/article/PIIS0140-6...

immutability doesn’t stop academic misconduct. in this specific egregious example it would enable serious harm to continue until every peer updates to the new versions. and there is no guarantee that happens.

the lancet, with their mutable web server hosted versions, were able to edit it and stick RETRACTED in big red letters all over the thing immediately [0]. the ability to edit due to misconduct is guaranteed.

like, i’m all for anything that stops OpenAI spamming web servers, or more generally anything that gets in their way. but there isn’t a perfect technical solution. torrents don’t solve the problem perfectly, they bring new trade offs.

that’s what i’m trying to help you see here. it’s mostly shades of grey.

[0]: by immediate, i mean “once they finally made the decision to retract it waaaaaaaaaay later than they should have”. like, the update was immediate. not the paper was immediately retracted on publication.

By publishing a retraction notice, as academic journals have been doing for centuries on paper.