|
Libgen size is ~33TB so, no, it's not "the largest corpus of PDFs online". (Although you could argue libgen is not really "public" in the legal sense of the word, lol). Disregarding that, the article is great! (edit: why would someone downvote this, HN is becoming quite hostile lately) |
They all probably contain lots of duplicates but...
https://annas-archive.se/datasets