Hacker News new | ask | show | jobs
by kortilla 459 days ago
The entire library of Congress is like 10TB. You don’t need anything near petabytes until you get out of text into rich media.
1 comments

Common Crawl is petabytes. Anna's Archive is about a petabyte, but it includes PDFs with images.