| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Invictus0 2394 days ago
	I think the duplication issue is probably overstated. I doubt tackling that would shave off more than 20% of the total backup size.

3 comments

dooglius 2394 days ago

Speaking from personal experience, I usually see several results for any search. Granted, there's a big selection bias there, but 20% seems way too small.

link

abdullahkhalids 2394 days ago

Because you or anyone is most likely to search for relatively popular books. So those books will have a multiple copies. But for every popular book, there are many unpopular, but still useful books, that only have a single copy.

link

sgillen 2394 days ago

To be fair for textbooks at least I often see several results but often of different editions (1x edition 1, 2x edition 2, 1x edition 3 etc.). In some cases I think it's worthwhile keeping the different additions around, unless it becomes a huge burden.

link

MiroF 2394 days ago

Usually the different results have meaningful differences - often times different edition or translator etc

link

asdff 2394 days ago

In my experience it's different editions or mirrors.

link

throwaway894345 2394 days ago

It's probably more of a nuisance for people wanting to use the content. E.g., copies with different metadata or tags.

link

driverdan 2394 days ago

20% is not insignificant.

link

roland00 2394 days ago

Forking the LibGen to save 20% of file sizes will be counterproductive. Yes you save some storage but the network effects is more important, for people willing to contribute to "the one true thing" actually provides more seeders than the 20%.

link