Hacker News new | ask | show | jobs
by turc1656 2390 days ago
I don't see anyone having mentioned the possibility of posting this data to Usenet at all - at minimum for archival purposes which should be good for ~8-9 years. That way at least the data isn't lost. With so many of those torrents have 0 or 1 seed, this is a serious risk I think, despite the comments elsewhere about people rotating what they seed.

I realize that doesn't solve the access problem for most people as most of the users who need this research might not know how to use usenet or even be familiar with it at all, but I think the first major concern would be to secure the entire repository on a stable network. Usenet seems like a good place for that even if it doesn't serves as a means of distribution. Encrypting the uploads would make them immune to DMCA takedowns provided that the decryption keys weren't made public and were only shared with individuals related to the maintenance of the LibGen project.

1 comments

Two thoughts on that. Encoding it to a text format with CRC data for posting to usenet is highly inefficient in terms of data storage. And 33TB of stuff is not going to be retained for 8-9 years, the last I checked due to the huge volume of binaries traffic, the major commercial usenet feed providers have at most 6-9 months of retention for the major binary groups. Beyond that it becomes cost prohibitive for them in terms of disk storage requirements. This is not an issue for the majority of their customers, 6-9 months is more than long enough retention to go find a 40GB 2160p copy of some recently-released-on-bluray movie.
Entirely agree about the lack of efficiency. No question about that.

However, in my personal experience, I have seen no issues downloading old data from any binary group. At least not with the provider I have. In fact, just this past week I obtained something sizable (several GBs) with no damaged parts so didn't even need the parchive recovery files at all. This has always been my experience. I've never seen anything like the pruning you are talking about. That sounds more like an issue with your specific provider to me.

yEnc overhead is about 2% and there are plenty of providers with ~10 year retention.
Wow. I can't even imagine how much disk space ten years of retention of alt.binaries.* takes up. It's been literally ten years since I last did anything serious Usenet related.
Atleast in my experience, 10Y providers ask for more money and provide less high speed bandwidth (after which your up/down is usually limited to around 10 or 1mbps)