Hacker News new | ask | show | jobs
by ekr 4405 days ago
I don't know about this, to me archiving everything seems like a gross inefficiency. Most of the internet is spam and advertising, and of the rest, less than 5% is actually useful information or knowledge.

Archiving books, scientific journals and the likes would seem much more useful, but obviously you'd run into copyright issues.

2 comments

Agree that highest priority should go to the "serious" stuff. However the most interesting part of a really old magazine or newspaper, for me, is the advertising. For example an early 80s computer ad, or a 50s railroad or airline ad. I find that stuff really fascinating, and it gives more of the flavor of the era. It might have a surprising amount of value to a historian or anthropologist.
The trick is, of course, that it's nearly impossible to predict what will be useful to someone ahead of time. While you can probably sort out some of the spam, a comprehensive archiving project should probably avoid false positives when throwing things away.

Seems like a hard problem to solve. The low-hanging fruit would probably be detecting duplicates and combining them, which loses redundancy but handles all of those identical landing pages.