Hacker News new | ask | show | jobs
by Invictus0 2394 days ago
I think the duplication issue is probably overstated. I doubt tackling that would shave off more than 20% of the total backup size.
3 comments

Speaking from personal experience, I usually see several results for any search. Granted, there's a big selection bias there, but 20% seems way too small.
Because you or anyone is most likely to search for relatively popular books. So those books will have a multiple copies. But for every popular book, there are many unpopular, but still useful books, that only have a single copy.
To be fair for textbooks at least I often see several results but often of different editions (1x edition 1, 2x edition 2, 1x edition 3 etc.). In some cases I think it's worthwhile keeping the different additions around, unless it becomes a huge burden.
Usually the different results have meaningful differences - often times different edition or translator etc
In my experience it's different editions or mirrors.
It's probably more of a nuisance for people wanting to use the content. E.g., copies with different metadata or tags.
20% is not insignificant.
Forking the LibGen to save 20% of file sizes will be counterproductive. Yes you save some storage but the network effects is more important, for people willing to contribute to "the one true thing" actually provides more seeders than the 20%.