Hacker News new | ask | show | jobs
by sporedro 673 days ago
Just wondering what do you collect? Is it mainly mirroring things like libgen?

I have a decent collection of ebooks/pdfs/manga from reading. But I can’t imagine how large a 20TB library is.

2 comments

Just wondering what do you collect?

I can't speak for the OP, but you can buy optical media of old out-of-print magazines scanned as PDFs.

I bought the entirety of Desert Magazine from 1937-1985. It arrived on something like 15 CD-ROMS.

I drag-and-dropped the entire collection into iBooks, and read them when I'm on the train.

(Yes, they're probably on archive.org for free, but this is far easier and more convenient, and I prefer to support publishers rather than undermine their efforts.)

Yep, a good bit of them are from sources like this :)
No torrents at all in this data, all publicly available/open access. Mostly scientific pdfs, and a good portion of those are scans not just text. So the actual text amount is probably pretty low compared to the total. But still, a lot more than 8TB of raw data out there. I bet the total number of PDFs is close to a petabyte if not more.
> I bet the total number of PDFs is close to a petabyte if not more.

That's a safe bet. I'v seen PDF's in the GBs from users treating it like a container format (which it is).

It's probably tens of petabytes if not more, if you count PDFs that'd be private. Invoices, order confirmations, contracts. There's just so so much.