| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lithiumii 1000 days ago
	How about stay anonymous and just violate all the copyright laws? There's already pirate bay, libgen, sci-hub, zlibrary, etc., surely it's possible for there to be an opensource & pirate LLM model.

2 comments

CamperBob2 1000 days ago

If it were practical to mirror sci-hub and libgen, that would be one thing, but despite a lot of talk online I have yet to see a practical way to put my hands on such a thing.

link

vlabakje90 1000 days ago

I'm not sure what you are referring to with 'such a thing', but mirroring libgen and zlib is really not hard. Libgen offers Torrent links as does Anna's Archive. The libgen domains are fragile, but here's a link to the Anna's Archive torrents: https://annas-archive.org/torrents. They even have a page talking about training LLMs on this data: https://annas-archive.org/llm

link

roel_v 1000 days ago

Do any of these methods actually work though? Last time I looked (admittedly, 6 months or so ago), there were 0 seeders on the torrents.

link

CamperBob2 1000 days ago

Exactly, it's easy to say "Just torrent it," but that requires a lot of people to stick their necks out, including the user who just wants a copy of the data.

We need the ability to circulate HDDs physically in a semi-organized fashion, samizdat-style.

link

input_sh 1000 days ago

Mirroring libgen is definitely within reach, it's "just" 50 or so terabytes with torrents freely available for bulk downloading.

Realistically only maybe 10% of that is actually useful, but reaching that 10% is gonna be very labour-intensive. You would have to do a lot of cleanup of different formats, duplicate uploads, different editions of the same book, scanned PDFs, and what not, while big players with their own ebook stores (Amazon, Google, Apple, any ebook store) already have all of the proper metadata, a common format to work with, and a lot less duplicates.

link

john_minsk 1000 days ago

Isn't there some kind of standard for publication metadata? The one which will allow to uniquely identify publication + further track different editions as children of "original" publication? Maybe we should create one and make it freely available?

link

thelittleone 1000 days ago

How would one anonymously train a LLM of sufficient size to produce the performance needed? Does it not required hundreds/ thousands of expensive Nvidia GPUs?

link

bionhoward 1000 days ago

Hardware gets better, the masses have amassed quite a lot of it already, and it depends how soon you need your AI.

link

BeFlatXIII 1000 days ago

Hostile foreign nation trains it and releases it in the persona of an anonymous hackerman.

link

yanker2 998 days ago

Basically the plot of Ghost in the Shell.

link