I love the Wayback Machine, I wish they'd archive all pages though even those that don't wish to be archived.... keeping them away from public view until copyrights expire someday.
I don't know about this, to me archiving everything seems like a gross inefficiency. Most of the internet is spam and advertising, and of the rest, less than 5% is actually useful information or knowledge.
Archiving books, scientific journals and the likes would seem much more useful, but obviously you'd run into copyright issues.
Agree that highest priority should go to the "serious" stuff. However the most interesting part of a really old magazine or newspaper, for me, is the advertising. For example an early 80s computer ad, or a 50s railroad or airline ad. I find that stuff really fascinating, and it gives more of the flavor of the era. It might have a surprising amount of value to a historian or anthropologist.
The trick is, of course, that it's nearly impossible to predict what will be useful to someone ahead of time. While you can probably sort out some of the spam, a comprehensive archiving project should probably avoid false positives when throwing things away.
Seems like a hard problem to solve. The low-hanging fruit would probably be detecting duplicates and combining them, which loses redundancy but handles all of those identical landing pages.
Quite coincidentally, I was just now reading an interview with Brewster Kahle, from NewScientist (23 November 2002) - back when the Wayback Machine had only 100 terabytes archived.
He said: "I guarantee that in the future researchers will curse us for having missed something absolutely critical. But only people using the archive can tell us about mistakes in what we collect. There is a cheaper alternative concept, called 'dark archiving', which means that we should not give people access to them. But preservation without access is dangerous - there's no way of reviewing what's in there."
But later on, he mentioned that: "AltaVista was the first Internet search engine that tried to be a complete index of all the pages. But what really got me was that they threw away the original pages. That grated, no end."
Aside: Kahle was one of the founders, with Danny Hillis, of Thinking Machines - the company that created the fabulous 'Connection Machine'.
The Wayback Machine is so essential to the nature of the internet that I wish it could be made into some sort of automatic, decentralized service that's part of the web itself. Imagine how much linked knowledge would permanently, irreparably disappear if this one company went out of business!
Archiving books, scientific journals and the likes would seem much more useful, but obviously you'd run into copyright issues.