|
|
|
|
|
by bluish29
728 days ago
|
|
That's a historical question. At this time, most if not all the bots were either search engines or archival. The name was even "RobotsNotWanted.txt" at the beginning but made "robots.txt" for simplicity. To give another example, Internet Archive stopped respecting it a couple of years ago, and they discuss this point (crawlers vs other bots) here [1]. [1] https://blog.archive.org/2017/04/17/robots-txt-meant-for-sea... |
|
They showed no difference between search bots and archive bots. robots.txt was never for SEO alone. Sites exclude print versions so people see more ads and links to other pages. Sites exclude search pages to conserve resources. They said sites exclude large files for costs. And they can't think sites want sensitive areas like administrative pages archived.
Really Internet Archive stopped respecting robots.txt because they wanted to archive what sites didn't want them to archive. Many sites disallowed Internet Archive specifically. Many sites allowed specific bots. Many sites disallowed all bots and meant all bots. And hiding old snapshots when a new domain owner changed robots.txt was a self inflicted problem. robots.txt says what to crawl or not now. They knew all of this.