|
|
|
|
|
by sp332
4094 days ago
|
|
I think evidence goes the other way. You can ask web crawlers not to index certain pages with robots.txt even if it would be better for their business if they did. And this is widely respected. Now imagine that IIS put "* deny" in the default site config; it would get a lot less respect. |
|
I think one of the reasons robots.txt is generally respected is that there's a stick behind that carrot; hypothetically (what with us all using so much cloud these days), a site administrator that noticed a traffic spike commensurate with something ignoring robots.txt can respond by treating the requests as attacker-originated, which most "legitimate" sites would want to avoid.
What's the stick behind the carrot for do not track?