| HN Mirror

You meant search bots and other bots? Internet Archive's bot is a crawler.

They showed no difference between search bots and archive bots. robots.txt was never for SEO alone. Sites exclude print versions so people see more ads and links to other pages. Sites exclude search pages to conserve resources. They said sites exclude large files for costs. And they can't think sites want sensitive areas like administrative pages archived.

Really Internet Archive stopped respecting robots.txt because they wanted to archive what sites didn't want them to archive. Many sites disallowed Internet Archive specifically. Many sites allowed specific bots. Many sites disallowed all bots and meant all bots. And hiding old snapshots when a new domain owner changed robots.txt was a self inflicted problem. robots.txt says what to crawl or not now. They knew all of this.