Hacker News new | ask | show | jobs
by uniqueuid 1574 days ago
Just to point this out, on a technical level, the internet archive has very (!) little overhead.

Crawled data is de-duplicated on the request level and response payloads can be individually gzipped as well as having per-archive-file compression. [1]

[1] https://www.iso.org/standard/68004.html