Hacker News new | ask | show | jobs
by legatus 2391 days ago
There are ways to do so. The archive is made up of many, many torrents (I believe it's a monthly if not biweekly update of the database). If you have the storage/bandwidth availability for the whole 32TBs, please get in touch and I may be able to help you get the whole deal without too much hassle. Otherwise, just pick some torrents (it would be best to pick them based on torrent health, but they are so many to check manually) and try to keep seeding as much as possible.

EDIT: To find libgen's torrents health, check out this google sheet: https://docs.google.com/spreadsheets/d/1hqT7dVe8u09eatT93V2x...

Thanks frgtpsswrdlame for the heads up.

5 comments

If LibGen can announce all of the torrents in a JSON payload with health metadata, that can be consumed for automated seedbox consumption and prioritization. Check out ArchiveTeam's Warrior JSON project payload [1] for inspiration. It need not even be generated on-demand; render it on a schedule and distribute at known endpoints.

[1] https://warriorhq.archiveteam.org/projects.json

Actually there is now a google sheet which shows the health of the torrents so it should be easy to pick the most helpful torrents. It's linked in this post: reddit.com/e3yl23
I'm pretty surprised by the lack of seeders. Out of the 2438 torrents listed, a third have 0 seeders, another third have 1 seeder, and all but 5 have less than 10. Hopefully the publicity boosts those numbers.
From what I've heard a good chunk of people rotate their seeds for LibGen because their seedboxes can't handle all the connections for every torrent at once.
Is there some tool or documentation describing this practice?
I'm sure someone could get you the info to get setup as a seeder. For modern clients it's rather rather trivial to manage that many torrents. Get any decent modern CPU, 4gb+ ram, and $560 in storage and you're off.
I think the problem is that because of the size of each torrent, and there's 1000 of them, it's difficult to effectively seed all at once, so instead people would rather seed sections at once, and rotate through them.

I'm not sure how people setup the rotation though, that can't be an incredibly common feature but I could be wrong.

There are features that prioritize those with low seed/leech ratio in a sort of periodic fashion. Also it partially auto-balances because a swarm only needs a little more than unity ratio injected into it to get itself fully replicated. So each one that get's chosen because of a low seed/leech ratio will inherently drop out of that criteria as soon as the swarm is self-sufficient.
Why doesn't someone maintain a single torrent containing a snapshot of the full archive at a given point in time, updated (say) monthly?

I want a full mirror, and ain't nobody got time to deal with 2000 torrents, many of which have no seeders. That's a really dumb way to run this particular railroad.

Because torrent clients can't handle that many pieces in a single torrent. There are algorithms that are super-linear, maybe even quadratic or worse. They start causing trouble int eh TB range.

Also the UI for adding many torrents is much nicer than for selecting a non-trivial subset of files inside a single torrent. Also many parts of the ecosystem handle partial-seeds that do and will only for the near future seed a subset and not leech any other parts. They often get treated as leechers, despite not really being leechers.

TL;DR: 2k files are just a watchfolder and a cp * watchfolder/ away from working. Scaling does not work with one fat 32TB, however.

Thanks! I don't have 32TB free locally at the moment but I might soon. If and when that happens, I'll get in touch :)