Hacker News new | ask | show | jobs
by patmcguire 4230 days ago
Probably they don't, because so much of the web has robots files like

User-Agent: established_company

Allow: /some-stuff

User-Agent: *

Disallow: /

# keeps out filthy peasants

And you're either stuck following them, and not having data that would be offered up for free if you were someone else, or being a bad person and ignoring it. You don't really see the services that follow the rules.

Also, good paper on how much being on robots.txt preferred helps, which makes you a better product, which makes you more preferred...

https://etda.libraries.psu.edu/paper/9230/4516