Hacker News new | ask | show | jobs
by legatus 2389 days ago
There are groups behind data curation as well, though it is much harder. LibGen sees an addition rate of about 230 GBs per month, while SciMag's is around 1.10 TBs per month. We should expect those numbers to increase in the future. The man-hours required to curate those database may very well cost much more than the storage and bandwidth required to store duplicates and incorrectly tagged files. In any case, as I said, there are people seriously interested in curating the LibGen database, though most efforts I know of are still in the earliest stages.
1 comments

Do you know if they process PDF to reduce file size ?
A lot of the data is in the djvu format which is very efficient for scanned books.